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GENERAL EDITOR’S FOREWORD 
by 

Arthur Porter, ph.d., m.i.e.e. 

Professor of Industrial Engineering, University of Toronto; 

Formerly Dean of Engineering, University of Saskatchewan, Canada; 

Professor of Light Electrical Engineering, 

Imperial College of Science and Technology, London 

This volume is one of a series of texts in electrical engineering, and 
related subjects, the main object of which is to present advances in these 
subjects on the one hand, and to re-evaluate basic principles, in the 
light of recent developments, on the other. The books will be of special 
interest to research, workers, postgraduates, and advanced under¬ 
graduates in electrical science and applied physics. 

The publication of such a series is regarded as timely because of the 
increasing realisation that the average three or four years’ engineering 
degree course is inadequate for the basic training of research engineers. 
To correct this situation there has been a notable increase, during the 
past few years, of postgraduate courses in all branches of engineering 
in the Universities and Colleges of Advanced Technology of the United 
Kingdom, the British Commonwealth, and the United States. Strong 
official support for these courses is being given by the Department of 
Scientific and Industrial Research, London, and their importance 
cannot be over-emphasised. But such specialised courses require suit¬ 
able texts and many of them are seriously handicapped in this respect. 
This is not surprising in view of the proliferation of science and tech¬ 
nology and the recognition that new languages—one of which is the 
language of high-speed digital computers—have been born within the 
last decade. It is essential, moreover, that the authors of texts at ad- 
vanced levels should be actively engaged in research work because only 
I Vi the active participant working at the frontiers of his subject is in a 
position to assess the importance of new concepts and of the mathe- 
*3 matical and experimental techniques involved. I believe that this new 
series of electrical engineering texts will fulfil a real need and that it 
will be welcomed as a medium for re-evaluating fundamental engineer¬ 
ing principles in the light of recent advances with special emphasis on 
the needs of postgraduate students. 
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vi GENERAL EDITOR’S FOREWORD 

The present volume deals with the study of industrial operations and 
processes using large-scale data-processing and computer systems as 
simulators. It is perhaps the first book to be published on the subject. 
During recent years considerable advances have been made in the 
development of models of technological and sociological systems, 
especially those embodying information feedback, but a major difficulty 
has been the handling of the extensive computational work. Dr. Tocher 
has made notable contributions to both statistical mathematics, and to 
the art of simulation, especially in areas involving a deep understanding 
of probability theory. This book will be particularly welcomed by 
operational research workers and by research engineers interested in 
the optimisation of processes. The methods of operational research, 
especially those involving the application of digital computers in simu¬ 
lation toles, are proving of considerable value in the application of the 
scientific method to industrial and applied scientific problems. This book 
provides an authoritative introduction to these topics. 
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CHAPTER 1 


INTRODUCTION 

The subject-matter of this book stems from three origins and contains 
matters of interest to three groups of scientific workers. 

The first and most respectable origin lies in the theory of mathe¬ 
matical statistics. Before and for some time after the founding of the 
Royal Statistical Society in 1834, the subject of statistics consisted of 
the collection and display in numerical and graphical form of facts and 
figures from the fields of economics, actuarial science and allied des¬ 
criptive sciences. One of the most useful forms of display was the 
histogram or frequency chart and the transformation of statistics began 
when it was realised that the occurrence of such diagrams could be 
explained by invoking the theory of probability. 

The idea of a probability distribution had been established as a useful 
concept by mathematicians and has been studied by Laplace and Gauss, 
to name two of the most celebrated mathematicians interested. The idea 
that frequency charts could be explained as a practical consequence of 
the laws of probability applied to everyday matters seized the imagina¬ 
tion of the pioneers of mathematical statistics. Since a probability 
distribution is by its nature, in most instances, composed of an infinite 
number of items, and frequency charts by their nature are composed of 
a finite number of items, these latter had to be thought of as samples 
from an underlying theoretical probability distribution. The problem 
then raised itself at how to describe a probability distribution given 
only a sample from it. The mathematical difficulties of this seemed 
immense and such steps as were taken needed experimental verification 
to give the early workers confidence. Thus was born the sampling 
experiment. A close approximation to a probability distribution was 
created, samples were taken, combined and transformed in suitable 
ways and the resulting frequency chart of sampled values corhpared 
with the predictions of theory. 

Although mathematical techniques have developed to levels of 
sophistication that would astonish the early workers, the value of 
sampling experiments in mathematical statistics still remain^. The 
advent of automatic digital computers to perform the laborious calcula- 
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tions necessary has revitalised this as a possible approach to the solution 
of problems still beyond the reach of analysis. 

The second origin lay in the demands of applied mathematicians for 
methods of solving problems involving partial differential equations. 
These equations constantly arise in the mathematical formulation of 
numerous physical processes. Analytical solutions were found for a 
wide range of formal problems but practical problems involving com¬ 
plex or irregular boundary conditions could not yield to theoretical 
attack. The corresponding situations had arisen in ordinary differential 
equations and had been overcome by developing numerical techniques 
for solving such equations. The nature of the problem for partial 
differential equations made an attack on these lines far more difficult. 

A typical problem was the solution of the so-called diffusion equation, 
which arises, as its name implies, in the diffusion of gases as well as in 
the conduction of heat in a medium and many other physical systems. 
The characteristic of many of these systems was that the actual mechan¬ 
ism for the movement of the gas (or heat) involved a large number of 
particles behaving in a partly regular and partly irregular manner. 
Averaging over the particles enabled the random element to be elimi¬ 
nated and a deterministic description to be given. 

Now the theory of probability had studied formal models closely 
allied to a system of particles moving in this partly regular, partly 
random manner and had developed mathematical techniques for deal¬ 
ing with these problems which were studied under the name of random 
walks. The mathematical analysis of these problems gave rise to partial 
differential equations of the same type as the diffusion problem. Thus 
was born the idea of solving experimentally the diffusion and allied 
equations by random walks. The idea lay dormant for many years and 
was resurrected by Von Neumann and Ulam under the stress of the 
technological demands of the Second World War. 

These men conceived of the extension of the principle to seek solu¬ 
tions to difficult mathematical problems arising from deterministic 
problems by finding analogous problems in probability leading in their 
analysis to formally identical mathematical equations and then solving 
the probability problems practically by sampling experiments. By this 
time, the ultimate stochastic nature of the physical phenomena often 
present in the original problem was often forgotten or completely 
ignored. 

One of the simplest and most powerful applications of this idea, which 
they christened ‘Monte Carlo’, was the evaluation of a multi-dimensional 
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integral Consider the simplest case of evaluating the area of a bounded 
region. Surround the region with a square, scaling to make its side of 
unit length. Take a point in the area at random. The probability that 
this lies in the region of area A is simply A. Take a large number of 
points at random; the proportion of these lying in the area is an esti¬ 
mate of A. 

The whole idea can be generalised to higher dimensions and the 
result remains true. The great advantage of this method lies in the fact 
that for higher dimensions, conventional numerical techniques will 
require an enormous number of points to get any answer at all. The 
Monte Carlo method can get an answer with any number of points 
although of course the accuracy falls off as the number decreases. 

In fact, the accuracy is proportional to V«, n being the number of 
points. Conventional methods give an accuracy proportional to n i/d 
where d is the number of dimensions. 

The third origin lay in the new science of Operational Research. This 
set itself the tasks of applying scientific method in everyday life—to 
military problems during the last war and to problems of industry after 
it. In common with most sciences, it developed by building models of 
the systems it studied and using these models to give insight and some¬ 
times quantitative information about these systems. 

The outstanding difference between the subject-matter of conven¬ 
tional science and operational research lay in the greater variability of 
the phenomena studied. It was vital to bring regularity back into the 
description of these phenomena—to find a usable description of the 
variability. This was achieved, as had been done for economics and 
actuarial science in the previous century, by using probability theory. 

The models studied were primarily probabilistic and the theoretical 
difficulties of these models were immense. A special branch of prob¬ 
ability theory arose under the title of Queue Theory to deal with some 
of these problems, but the irregular nature of the boundary conditions 
and restrictions in the real problems could rarely be incorporated into 
the models that mathematical methods could solve. 

Once again the scientist turned to an experimental technique. Now 
the problem was phrased in a language description of the real world 
involving queues, stocks, machines and operating instructions. The 
subject of simulation, as it was called, took on an identity of its 
own. 

Yet this was nothing new. To a statistician the problem was nothing 
more than to find the sampling distribution of an intricately and irregu- 
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larly defined statistic and because of the intricate nature of the defini¬ 
tion, to do this by a sampling procedure. 

The common element in all these approaches to our subject is 
probability theory. In these circumstances, it is hardly necessary to 
stress that an understanding of at least the elements of the theory are 
essential to an understanding of the principles and practice of sampling. 

Any attempt to give a self-contained account of the theory necessary 
as part of the book would be self-defeating. It could be nothing more 
than a few formal definitions and readers ignorant of the subject might 
be tempted to read further with only such a brief introduction. There¬ 
fore it is assumed that the reader has the necessary grounding in ele¬ 
mentary probability theory. 

Likewise, the elements of the theory of statistics are assumed. The 
commonest distributions such as Normal, Student’s, x 2 , binomial, 
exponential, are all discussed without a long introduction and the 
X -test of goodness of fit is not explained, as this must surely be well 
known to all scientific workers. 

Since the basic process involved is the drawing of samples from 
different distributions, this receives first study. The problem is reduced 
to drawing a sample from a uniform distribution (or drawing an integer 
from a range) ‘at random’. 

The methods of solving this problem by both machinery and mathe¬ 
matical methods are discussed. 

This is followed by an account of the problems in determining a 
frequency distribution by sampling. 

The possibility of using sampling methods as practical methods of 
acquiring knowledge rests on the possibility of taking really large samples 
to reduce the effect of the inherent variability. This in turn is possible 
only because of the existence of automatic digital computers. 

It is assumed that one of these powerful devices is available and this 
raises the problem of describing how to use these machines for sampling. 
Again, this book is not the place to attempt to give details of the prob¬ 
lems of instructing any particular machine to perform the required 
calculation. 

However, the idea of a flow diagram as a means of describing the 
calculating procedure in a form suitable for transcription into a 
machine code is explained and forms a powerful means of reducing the 
volume of description of computing procedures. 

Probability distributions are described by means of a set of para¬ 
meters and the central problem of sampling is to specify the value of 
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such parameters given a sample from the distribution. This is the prob¬ 
lem of estimation, which receives full treatment. An estimate must be as 
accurate as possible, and methods of sampling and estimation that 
increase the accuracy of the estimate without a corresponding increase 
in the volume of computing are described. 

We are then ready to tackle the problems of simulation. First simple 
queues are dealt with, gradually increasing in difficulty until a general 
case is dealt with. 

More complex systems are then treated and a general technique 
evolved for dealing with any system. 

Finally the question of the design of such large-scale experiments is 
considered. 



CHAPTER 2 


SAMPLING FROM A DISTRIBUTION 
2.1 Introduction 

We have seen in the preceding chapter that the basic problem for 
simulation consists in sampling from statistical distributions, and this 
chapter will describe a variety of methods of performing this basic 
operation. 

There are two main types of distributions from which samples are 
required: those in which the statistical variable takes a continuous 
range of values giving rise to a continuous probability density function, 
and those in which the statistical variable can take a discrete number of 
values. The normal distribution is typical of the first class and the 
binomial distribution (the number of successes in a fixed number of 
similar independent trials) is an example of the latter. 

However, for practical purposes, we must work with approximations 
to continuous variables and have to be satisfied with samples from a 
grouped distribution represented by a histogram. If the different groups 
are given names, then the problem reduces to selecting a name for the 
appropriate frequency and consequently there is no distinction in 
practice between the two types of distribution. 

Although it is possible to discuss the meaning of probability at great 
length, the earlier remarks indicate that our purposes are met by a 
frequency concept. Thus our problem is to devise a process of selection 
from the distribution so that the results of the repetition of this process 
will give rise to a frequency distribution of sampled values that matches 
the frequency distribution required in our simulation. It is possible for 
there to be quite intricate probability rules relating successive sample 
values, but the most important case that will be considered in detail 
first is where these samples are independent of one another in the sense 
that the probability distribution for any given item from the sample 
is independent of the preceding sample values. 

To effect the random selection, some source of randomness is 
required, and to give a general solution to the problem we want to use 
a common source for all types of distribution. The most appropriate 
source of such randomness is from random numbers. These may be 
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supplied either by tables, specially constructed by means of randomising 
machines, or by the use of mathematical devices to generate long 
sequences of numbers that have similar properties to random num¬ 
bers, the so-called pseudo-random numbers. These techniques will each 
be pursued in later chapters. 

We consider first the elementary methods of selecting from a fre¬ 
quency distribution and show in a series of steps how this can be 
improved and mechanised. 

2.2 The ‘Top Hat’ Method 2 

The obvious method to obtain a sample from a frequency distribu¬ 
tion has often been used in early historical examples of simulation. It 
consists of representing each of the possible values of the statistical 
variable by means of a set of discs. The number of discs given any 
particular value is proportional to the frequency with which it occurs 
in the frequency distribution. The total number of discs used is limited 
in an upward direction by the problems of the physical space occupied 
by the discs and in a lower direction by the accuracy with which the 
frequencies must be approximated. The collection of discs is well 
shuffled in a hat or other receptacle and a disc drawn at random. The 
values on successive discs after drawing (each being replaced after it 
has been read) constitute the sample. 

As an example, suppose we wish to draw a sample from a normal 
distribution with mean 18 units and standard deviation 4 units. The 
first step is to decide on the width of grouping that will be tolerable. 
This is settled by the application and we suppose a class width of 2 units 
is chosen, and the class boundaries are taken symmetrically about the 
mean. 

The next step is to determine the probability of this random variable 
falling in the intervals — 1 to 1, 1 to 3, 3 to 5, . . ., 15 to 17, ... and 
these are obtained from a table of the normal distribution function. 
This is given in Table 1. 

If we decide to represent the frequencies to three decimal places, we 
will require 1,000 discs. The frequency of negative values is negligible. 
Thus we number 28 discs with the number 10 representing the class 
interval from 9-11, 2 with 6 representing the interval 5-7, etc. In 
practice, the rounding has to be adjusted to give the sum of the fre¬ 
quencies exactly 1,000 or a few discs rejected. From the table given one 
disc is rejected. 

This example raises a problem of dealing with the tails of distribu- 
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Table 1 

No. of discs 

Class 

Frequency 

{total 1,000) 

-1 

0-0000 

0 

1- 3 

0-0001 

0 

3- 5 

0-0005 

1 

5- 7 

0-0024 

2 

7- 9 

0-0092 

9 

9-11 

0-0279 

28 

11-13 

0-0655 

65 

13-15 

0-1210 

121 

15-17 

0-1747 

175 

17-19 

0-1974 

197 

19-21 

0-1747 

175 

21-23 

0-1210 

121 

23-25 

0-0655 

65 

25-27 

0-0279 

28 

27-29 

0-0092 

9 

29-31 

0-0024 

2 

31-33 

0-0005 

1 

33-35 

0-0001 

0 

^35 

0-0000 

0 



999 

tions which are unbounded. In order to represent such distributions 
correctly, we need an infinite number of discs so that the extreme tails 
with their correspondingly extreme small probabilities can be repre- 

sented at all. This lays 

a practical limit on 

the length of tail approxi- 


mated. 

However, this difficulty can be overcome in the following way. Let 
the top group of the distribution represent the frequency with which an 
item should lie beyond the last bounded group. If a disc representing 
this unbounded group is obtained, then a further sample is taken from 
a distribution that gives the proportion of the items having a given 
value conditional upon the item being in the upper tail. Thus to 
accommodate the possibility of sample values in the range greater than 
33 the following Table 2 of relative frequencies is constructed from 
more accurate tables of the normal integral. 

The primary labelling of discs is adjusted to allow for the extra 
unbounded class, and a secondary labelling made according to the 
second distribution. If the first sampling gives rise to an item belonging 
to the upper tail, another disc is selected at random and the second value 
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Table 2 


Class 

Frequency 

Relative 

frequency 


33-35 

0-0000677 

0-8635 

864 

35-37 

00000097 

0-1237 

124 

37-39 

0-0000009 

0-0115 

11 

^39 

0-0000001 

0-0013 

1 


0-0000784 

1-0000 

1,000 


on that is used as a direct sampled value. The lower tail can be dealt 
with similarly. The tails of the conditional distributions can be sampled 
in the same way and there is no limit to the accuracy obtainable. 

There are two main objections to this elementary method of sampling. 
The first is the very practical one that the whole sampling process is 
rather laborious and the rate of sampling is limited by the rate that 
discs can be shuffled and selected. As we shall see, it is essential for 
large-scale simulations to mechanise the arithmetic processes involved, 
and this implies for their effective execution that the process of random 
selection should also be mechanised. 

The second and more serious objection is that in practice the 
shuffling process that can be applied to these discs is not adequate and 
many trials have been made which show that samples drawn in this way 
are not in fact random. To illustrate the point, the following experiment 
was performed. 


Number 

Table 3 

(x — A) 

1 

51 

1 

2 

46 

16 

3 

38 

144 

4 

57 

49 

5 

60 

100 

6 

50 

0 

7 

58 

64 

8 

41 

81 

9 

51 

1 

10 

48 

4 

__ 

,—- 

-— 


500 

460 
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A set of 1,000 discs labelled with numbers from 1 to 10 (100 of each) 
were prepared and sampled in the way proposed. The result of 500 
drawings is given in Table 3. 

Column 1 gives the actual frequency of occurrences of the different 
symbols. The expected number of occurrences of each class is 50. 
Applying a i 2 test gives i 2 = 9-2 which is significant at the 1 % level. 
It is clear that the samples of 4, 5, 7 drawn from this distribution were 
markedly higher than expectation, whilst samples of 3, 8 were below 
expectation. 

2.3 A Mechanised Version of the ‘Top Hat’ Method 

This can best be understood if we consider how the selection of the 
discs in the ‘Top Hat’ method could be made reliably random. Suppos¬ 
ing in addition to the variate values inscribed in the discs, the discs are 
numbered serially. Our requirement is that on each drawing any one 
of the discs could be equally likely to be selected, and this implies that 
the serial number on the selected disc is equally likely to be any of those 
present. Thus the random selection can be made secure by ensuring the 
serial number of the disc drawing is a random number selected from a 
reputable source. The validity of the selection process now depends 
entirely on the validity of the source of random numbers used in the 
sampling process. 

This, of course, does not overcome the objection to the clumsy 
nature of the process, but gives the key to mechanised techniques for 
doing it. In effect, the collection of discs represent a function. This 
function has, as its independent variable, the serial number, and as its 
value, the variable inscribed on the disc. The whole apparatus of discs 
can be discarded and the process applied by using a table of values of 
random variables, each associated with a serial number. Since the serial 
numbers will be selected at random, there is no restraint on the arrange¬ 
ments of the values of the variable on the discs, and the simplest 
arrangement is to use the lowest serial numbers for the lowest variable 
value, the next serial numbers for the next lowest variable and so on, 
so that the variable values as a function of a serial number are mono¬ 
tonic. The discs, or items, can be thought of as being used to build up a 
histogram of discs as shown in Figure 1. 

In this figure, each cell has the serial number in the top left-hand 
corner and the variate value in its body. The variate values in any given 
column are thus constant. Thus, using this example, if our source of 
random numbers gives the random numbers 34, 18 and 41 respectively, 
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the three successive samples chosen from this distribution will be 
6, 4 and 7. 

The function defined in this manner is a rather flat one in that the 
variate values only occasionally change value with the serial number, 
and it is possible to compress the information considerably by listing 



only the critical points in a table. These are the serial numbers of those 
items at which the variate value has changed. Thus the histogram can 
be turned into the following Table 4: 

Table 4 

Value 0 1 2 3 4 5 6 7 8 9 1.0 11 

First serial No. 1 2 5 9 14 22 32 39 44 47 49 50 

If the random number selected is 33, since this lies between the 
critical numbers 32 and 39, the value of the variate sample is taken as 6. 
This compression of the tabular function is available even for hand 
simulations, but is of greater importance if it is intended to use a digital 
computer for the selection process as it saves a considerable amount of 
storage space. 
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This tabular function has an interesting probabilistic intepretation. 
The distribution of a variable can be described by a density function in 
the continuous case or by a set of frequencies in the discrete case, but 
both cases (and other more complicated ones) can be described by 
means of the cumulative distribution function which specifies the 
probability of obtaining the given value or less from a sampling distri¬ 
bution. The tabular function introduced above is the inverse of this, 
and this is illustrated in Figures 2(a), 2(b) and 2(c). 

Figure 2(a) gives a typical density function, 2(b) gives its cumulative 
distribution function and 2(c) its inverse. The table look-up technique 
introduced before consists of choosing a point on the independent axis 
of the inverse cumulative distribution and choosing the corresponding 
dependent value as the sample value. It is very easy to see that this 
process wilt reproduce the required distribution as follows: 

Let r be the random number in the range 0-999 say. Then the distri- 

Y 

bution of u = -is approximately uniform over interval 0-1. 

1000 


Then 


(0 0 


p(u) = 


v 


1 

0 


P(u) = u 
1 


u ^ 0 
Ogttgl 
u ^ 1 


where p(u ) and P(u) are the probability density function, and cumulative 
distribution function respectively of u. 

If v is to have a p.d.f. f(x) with c.d.f. F(x) then let the inverse of 

F beO 

y = F(x) x = 0>(f) 


Put v = <D(w) 

Then 

Prob(x < X) = Prob(F(x) ^ F(X)) = Prob(w ^ F(X)) = F(X) 

Thus the c.d.f. of x is F(x) as required. 

Viewed in this light, it is seen that the problem of sampling from any 
distribution is that of transforming a random number representing the 
uniform random variable in the range of 0-1 by means of the inverse 
cumulative distribution function. The techniques considered so far 
define this function by means of a table, but there are other methods 
available for the representation of a function, and many of these can be 
applied in this case. The appropriate methods consist of: 
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(i) Analytic inversion of the cumulative distribution function and 
the calculation of the value of this function for the value of a 
selected uniform random variable. 

(ii) Numerical inverse interpolation in the distribution function 
determined analytically. 

(iii) A process of numerical inverse interpolation in a numerical 
approximation to the cumulative distribution function. 

(iv) The numerical approximation to the inverse cumulative distribu¬ 
tion function itself. 

Each of these methods has its merits in various circumstances, and 
we will consider them in turn. 


(i) Analytic Inversion . 

The probability density functions of theoretically derived distribu¬ 
tions are often complex in form and the possibility of analytic inversion 
of their integrals is usually difficult. However, there are some simple 
but quite important cases in which this is possible. For example, a 
commonly arising distribution is the negative exponential, which has 
p.d.f. of the form 

p(x) — e~ x 

and c.d.f. 

P(x) = l-e~ x 
The inverse of this is simply 

*= -lOge(l-P) 


Since the distribution of 1 —P is the uniform over 0-1, if the distribu¬ 
tion of P is uniform over 0-1 we have the simple rule: 

To find an exponentially distributed variable to take the negative 
of the logarithm of a uniform random variable. 

The convolution of the two uniform variables gives a triangular 
distribution defined by 


ro 


f 0 x < 0 


p(x) = 


X 

2 — x 
0 


P(x) = 


ix 2 


2x — \x z — 1 


0 ^ x ^ 1 
l^x^2 
x > 2 


1 





SAMPLING FROM. A..DISTRIBUTION 
P(x) is a function which can he inverted by the following rule: 


P X = V2 P 

$P g 1 X = l-y/W-P) 
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However, as the convolution is merely the distribution of the sum of 
two random variables, in this case it may be more economic merely to 
sample two uniform random variables and add them. 

A third distribution which can be treated in this way is the Cauchy 
distribution with probability density function: 


This gives the c.d.f. as 


p{x) 


1 1 
n 1 + x 2 


P(x ) = i + - tan 1 

71 


X 


and thus the inverse is given by 

x = n tan (P —£) 

(ii) Numerical inverse interpolation in the analytic function. 

There are cases in which it is possible to integrate the probability 
density function and represent the c.d.f. analytically, but the inversion 
is not possible analytically. In this case it is possible to proceed by 
generating the analytic function for a range of x in the neighbourhood 
of the required value and then using the random number selected as an 
interpolate for the required value. This is a tedious process for hand 
techniques, and it would only be considered on digital computers. In 
practice, the number of c.d.f.s that can be calculated by direct substitu¬ 
tion in an analytic formula are rather small, and this method is usually 
replaced by 

(iii) Numerical inverse interpolation in a numerical approximation to 
the c.d.f. In this case, an approximation to the c.d.f. is found by an 
algebraic formula, values are generated and interpolated. However, 
most effective numerical approximations to the c.d.f.s are quite arbitrary 
and empirical and under these circumstances it seems advisable to work 
directly with an approximation to the inverse of the c.d.f. and so avoid 
the necessity for the inverse interpolation. This is method (iv) which will 
be considered in some detail. 
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(iv) Approximation to Inverse c.d.f 

The general shape of the function required is given in Figure 2(c) 
and this has the properties that 

F -> oo as v -> 0 , F(x) -» + co as x -> 1 

(where x is now written for the independent variable P) and so the 
function is likely to satisfy a differential equation of the form 

f(x) = - + A- + <Kx) 

X 1 — x 

where 4>(x) is a regular function (i.e. has no singularities) and a and b 
are constants or bounded functions for v in the range [0, 1]. Taking 
a and b as constants gives 

F(x) - a log v + b log (1 - x) + 'F(v) 

where is another regular function. 

In practice it is found that this form is not very satisfactory and 
certain arbitrary adjustments have to be made to it to give better fit 
to a wider class of functions. The form finally found to be most useful is 

F(x) - a + bx + cx 2 + ot(l-x) 2 log x + flx 2 log(l-v) (1) 

This satisfies the requirements set out before and uses a simple poly¬ 
nomial for T^v) and replaces the constants a and b by functions to give 
better fit in the tails. In practice this formula is difficult to use because 
of the logarithmic function. If it is being used in manual calculations, 
then tables of logarithms can be used, but the merits of this method lie 
in the possibility of its automation and in this case, the calculation of 
the logarithm is a comparatively slow process and thus an approxima¬ 
tion to the logarithm is necessary. It is clear that the base of the loga¬ 
rithm is immaterial, as this merely alters the constants a and /? by a con¬ 
version factor, so we use the base 2 and use the crude approximation 

log 2 v ~ log 2 X.2 ~ d ~ -d+(l-X) 

where 

v = X.2~ d , i ^ Z< 1 

The advantage of this method is that we may now treat all distributions 
in the same way whether they be analytically derived or arising from 
practical data. In both cases, a method of determining the constants 
a , b , c, a and is required which we shall consider later. 
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As an example of this method, it has been found that for a normal 
distribution with mean n and standard deviation <7 the following 
formula gives an adequate approximation. 

a = (16384/x — 13452*960cr) x 2 -11 
b = (26953-865<j) x2 -11 
c = 0 

a - ( —3772-769 <j) x2~ u 
p = (3772-769ff)x2 _11 

The difference between the true and approximate cumulative distribu¬ 
tions is shown in Figure 3. 
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It will be seen that the approximate function has a discontinuous 
derivative which is caused by the approximation to the logarithm that 
is used. The approximation using an exact logarithm is also plotted, 
which shows that the error introduced by the logarithmic approxima¬ 
tion is less than the error of the main approximation, and that the 
maximum total error is very small. 

The approximation works successfully in the case of unimodal 
distributions, but is not successful for J-shaped distributions or multi¬ 
modal distributions. 



In the case of J-shaped distributions, a simple artifice will enable the 
technique to be used. Assume the distribution has an origin at zero and 
can be represented by the heavy line in Figure 4. 

We reflect this distribution in the y axis (dotted line) and rescale to 
give the chain-dotted distribution. This is a symmetric unimodal 
distribution and the transformation may be used. By taking the absolute 
value of a sample from this distribution a sample from the original 
distribution is obtained. 

If the distribution is multimodal, it can be represented as a mixture 
of several distributions each unimodal, and expressed by a suitable 
approximation: 

f(x) = Z Pi/iC*) 

i 

The interpretation of this is that the observation has come with prob¬ 
ability pi from a distribution with p.d.f./)(*). Techniques for the deter¬ 
mination of the component distributions have been developed, using the 
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method of moments, and more recently, techniques using the method 
of maximum likelihood to estimate the parameters of the system using a 
digital computer have been used. 

A random variable is used to select which distribution shall be 
sampled. The distribution chosen is then sampled and the resulting 
variable will be a proper selection from the multimodal distribution. 

Thus to represent the distribution 

F(x) = y a,) (£>, = 1) 

i= 1 

two uniform random variables ...Mi, Ri are chosen. A random normal 
deviate X(N(0, 1)) is selected using the variate R x and the approxima¬ 
tion described earlier. 

The variate R 2 is used to select the population sampled. 

If R 2 < pi \ f x-^+Xoi 

If Pi^R2<Pl + Pl / take \ X — jX 2 + Xa 2 
If Ri^Pi+Pz) [x = p 2 + X(T 2 

The technique used for obtaining the coefficients a, b , c , a, ft of equa¬ 
tion (1) should be a general one which is equally applicable if the 
distribution in question is given theoretically or empirically. We shall 
deal first with the case of an empirical distribution. From the data it is 
possible to construct a table giving for the various cumulative frequen¬ 
cies the corresponding variate value. We require to pass the curve of ‘best 
fit’ through this series of points and a possible method to use is the 
method of least squares. There is no theoretical justification for this pro¬ 
cess as the observations to which this method is applied are not in¬ 
dependent and do not have a homogeneous variance. Nevertheless, it is 
found that in practice this method gives adequate values for the 
coefficients a , b , c, a, /?. It is always advisable to verify the validity of the 
fitted function by calculating the function for a series of values of x and 
deriving an approximation to the p.d.f, by inverse interpolation and 
numerical interpolation or by graphical means. 

There are many applications in which the exact nature of the distri¬ 
bution is not known, because it is derived from empirical data and in 
this case it is only necessary to check the robustness of the answers 
from the sampling experiments performed against changes in the basic 
distributions used. If an automatic computer is used to do the sampling 
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processes, this is a possibility, but otherwise it would be prohibitively 
heavy in labour. 

For a theoretical distribution the percentiles are obtained and a 
similar fit by least squares is used on this as if it were data from an 
empirical distribution. 

It is at this point that the distinction between discrete and con¬ 
tinuous distributions is brought to the fore again. For a discrete 
distribution, the jumps in the distribution function are real and we 
require our function to pass through the saltuses. The values taken are 
then rounded down. For a continuous distribution, these jumps have 
been introduced artificially by the grouping process, and it is best for 
the function to pass through the mid-points of the saltuses. In the case 
of a theoretical distribution where the P-axis increments can be arranged 
constant, we need only fit to the left-hand end points and then adjust 
the constant a in the formula by half the percentile interval used. 

If it is required to determine the mean and variance of the fitted 
distribution, these can be found from the coefficients a , b , c, a, /?. If the 
mean and variance are p and a 2 respectively then 




xp(x) dx = 


I 


x dP(x ) 


<7 2 = 


x 2 dP(x) — p : 


But 


a + bP + cP 2 + a(l - P) 2 log P+ PP 2 log P 


(a + bP + cP 2 + a(l — P) 2 log P+ PP 2 log (1 - P) dP 


— a +- + - + (<*+P) 


(1~P) 2 log P dP 


The integral can be evaluated numerically from the approximation 
to log P used and we obtain 

^l-P) 2 log PdP± 0-904 

Jo 


u=a + - + ~ + 0-904(a + /3) 


Then 
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Similarly we can obtain a 1 as a quadratic function of the parameters 
b, c , a, /?. The parameter 0 which controls the location does not appear, 
of course. 


(T 2 = [b, c, a, p)Ci[b, c, a, /?]' 

where 


a = 


" + 0-083333 
+ 0-083333 
-0-312104 
+ 0-312104 


+ 0-083333 
+ 0-088888 
-0-365733 
+ 0-258474 


-0-312104 
-0-365733 
+ 1-973261 
+ 0 


+ 0*312104" 
+ 0-258474 
+ 0 

+ 1-973261 


For a symmetrical distribution, with mode m t we must have for 
each a: 

x = F(P) 
m-x — F(l~P) 
i.e. F(P)+F(l-P) = m 

2a + b + c)-2cP + 2cP 2 + (a +- P) 2 log P + P 2 log (1 - P)j = m 


From this the necessary conditions for symmetry are 

a — -P 

c = 0 


when the mode is given by 

m = 2 a + b 


The median is given by 

x = F® = a + i6 + ic + (a + jff)Jlogi 
= fl + i^ + ic-i(a + /J) 

This confirms that the mean and median only coincide for symmetric 
distributions. However, it may happen that a better fit is obtained by 
relaxing the condition of symmetry. This occurs, for example, with the 
approximation quoted for the normal distribution. 

If a quantile measure of dispersion is used the semi inter-quantile 
range R can be easily calculated 

= i(b + c)+^(j3 + tx)(9 logi-logf) 



22 


THE ART OF SIMULATION 

An alternative fitting procedure uses the method of minimum % 2 . 
For any choice of parameters a, b, c, a, p, the probabilities P t falling in 
a range can be obtained by inverse interpolation and the actual fre¬ 
quencies (in case of an empirical distribution) or the theoretical prob¬ 
ability (in case of a theoretical distribution) compared by compiling 
a x 2 statistic. The parameters can now be estimated to minimise this 
quantity. 

This procedure involves the solution of non-linear equations and can 
only be achieved by an iterative process. A method is that of steepest 
descent. 

This method involves an arbitrary element in the classes used to 
sub-divide the range and different answers are obtained from different 
divisions. However, all possible solutions will approximately reproduce 
the distribution. 

The arbitrary nature of these techniques is merely a reflection of the 
arbitrary nature of the fitting problem. With a limited number of 
parameters the fit can only be improved at one region of the distribution 
at the expense of another region. 

A very simple approach is to specify 5 quantiles of the distribution. 
Suppose the quantiles Q 1} Q 2 , ...,Q 5 are to be at x u x 2 . 

Then the equations 

+ bQ t + cQf + ocQfL(l -Q t ) + p( 1 - Qd 2 UQd L = 1,2,..., 5 

are linear in a, b, c, a, P and on solving these a fit is obtained which 
gives the desired values in the neighbourhood of x l9 x 2i • • •, t 5 . 

For a symmetric distribution, the necessary restraint must reduce 
the independent qualities that can be fitted to 3. The good fit for the 
normal distribution implies that for markedly non-normal symmetric 
distributions the fit cannot be so good. This can be remedied by 
replacing the polynomial terms by a higher order polynomial and at 
the expense of more calculation during sampling any accuracy can be 
attained. 

The logarithmic terms could be dispensed with but their inclusion 
greatly reduces the order of polynomial required. 

2.4 The Method of Mixtures 

The technique used to sample from a multimodal distribution can 
often give fast sampling techniques for unimodal distributions. It is 
particularly useful if very accurate sampling is required. 
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The distribution is represented as the mixture of a set of rectangular 
distributions and a residual. This is best explained by reference to 
Figure 5. 

Figure 5(a) gives the distribution to be sampled, partitioned into 3 
distributions. Figure 5(b) is a rough histogram which includes about 



Figure 5 


90% of the density. The set of triangular distributions of Figure 5(c) 
account for all but about 0-2% of the remaining density. This latter 
has the distribution of Figure 5(d). 

If the probability area under these functions are p x , p 2 , /> 3 , a random 
number is used to select the distribution to sample. 

If component (a) is selected, a second random number is used to 
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select a rectangular component and then a third random number is 
scaled and its origin adjusted to give the required value. 

If component (b) is selected, a triangle is selected with a second 
random number and a third random number is converted into the 
required value by a square root transformation. 

If component (c) is selected, a more complicated procedure will be 
necessary, but this will rarely occur and the average time taking a 
sample is quite small. 

For theoretical distributions, the class boundaries of the histogram 
can often be chosen, so that they or the class areas are easy to calculate 
and need not be stored. Otherwise, the method can be quite costly in 
storage space. 

2.5 The Rejection Procedure 

It is often possible to calculate the p.d.f. of a variable fairly easily, 
but difficult to evaluate its integral or the inverse. A technique called 
the rejection technique has been developed to deal with this situation. 

Consider a bounded p.d.f., i.e. one for which there exist values 
a, b, M such that 

( 0 x< a 

f(x) af^xf^b 0 ^ f(x) ^ M 

0 b> 0 

For any x in range [a, b] calculate y = ; the probability that a 

uniform random variable R will not exceed y is precisely y. If the value 
x is chosen at random on the range [a, b] and then rejected if R > y, 
the p.d.f. of accepted x's will be/(x). 

This follows since the probability of a given pair x t R giving an 
accepted value x is y dx and the probability of being accepted for 
any x is 

b y dx = — f fix) dx = — since fix) is a p.d.f. 

J / M) a M 

Thus the conditional p.d.f. of an accepted pair having a fixed x-value 

is My = fix). . J . 

The number of trials before a successful pair is found is a random 

variable and has a distribution 

p(n) = E(l-£)" -1 where E = -i 
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This has a mean value 

I (i—£) B l 

n-1 dE J j 

= = 1 
E J E 

The technique will not work without modification if the distribution 
required is unbounded. 

The modification consists of using an unbounded random variable 
to make the primary selection, this being obtained by a transformation 
of a uniform variable as previously described. 

The rejection criterion can also be generalised and the complete 
procedure is as follows: 

Put p( x ) = kQ(x)r(x) 

where r(x ) is a p.d.f., Q(x) is c.d.f. and k is a normalising constant. 
Select x from the p.d.f. r(x) and y from the c.d.f. Q(y). Accept the pair 
if x < y and use x as the required variable. 

The probability of choosing the x is r(x ) dx and of accepting it is 

Q(x). 

The probability that the pair is acceptable is 

f i 

Q(x)P(x) dx = - 

J k 

so the conditional probability of an accepted pair having a fixed 
x-value is kQ(x)P(x) dx as required. 

It is often possible to effect the test y ^ x without calculating Q(x). 

If M) is any increasing monotonic function with an inverse (j) the 
test y < x is entirely equivalent to f(y) </(*). The c.d.f. of z = /(>>) 
is given by g(0(z)) = T(z), say. 

If/and hence </> is chosen properly, the function T(z ) may be simple 
and the procedure becomes: 

Select x from R(x) and z from T(z) and accept the value x if and only 
if z</(x). 

Since Q( x ) = 1 

r(x) k 

0 

~ r(x)~ 


A.S.— 2 
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By an argument similar to that for the simple case, the mean sample 
size is k and so to economise on sampling, r(x) should be chosen so that 

max is minimised. 

* r(x) 

In practice this means choosing r(x) to be similar in shape to p(x). 
This is in accord with the dictates of common sense since then the 
amount of rejection to convert the primary selection probability for 
* from r(x) to p(x) will be small. 

A similar technique can be used writing 

p(x) = k[l-Q(x)]r(x) 

and only accept a pair if y > x. The argument and extension are exactly 
similar to the preceding case. 

To illustrate the technique, consider the normal distribution. Start 
with the distribution of the absolute value z of a normal variate of zero 
mean and unit standard deviation. This can be converted to a normal 
variable by assigning a sign at random. 

p(z) = e~ iz2 0 g z < 00 

= /?fe-tt*" 1 ’ 2 *-* 

V n 

Now the p.d.f. of a negative exponential distribution is r(x) = e~ x 
and its c.d.f. is Q(x) = l-e~ x . Then we may write 

p(z)= /— {1-eOOMz) 

V 71 

where y — \{z— l) 2 . 

Thus if y and z are selected from a negative exponential distribution 
and the pair only accepted if y ^ iO?—1) 2 > z have a normal 
distribution if a random sign is allotted. 

This technique is most useful in dealing with empirical distributions 
that can be roughly described by a standard easily derived distribu¬ 
tion and need ‘correction’ by a rejection process. The details of this 
application depend on the computing sources available and will not 
be discussed further. 
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The rejection technique can be amalgamated with the composition 
technique and the required p.d.f. represented as 

p(x) = I atp^x) = 2a, Qi(x)r,(x) 

where the Qfa) are c.d.f.s, r t {x) are p.d.f.s and a, are constants scaled 
to ensure that J p(x) dx-l. 

Select a value of i with probability a./Xa,. and then sample from pfa) 
by the rejection technique. 

The mixed scheme has been used by Batchelor to give a very fast 
procedure for the normal distribution. As in the previous example the 
modulus is first found and then a sign allotted at random. 

Write: 

Jl e ~ ixl = Jl ie ~ ix2 r(x)+i s(x) i 

when 

(0 , 0^x< l 

r(x) = s (x) = 

(0 x>l [2e- 2 <*-» X> 1 

i.e. r(x) is the p.d.f. of a uniform random variable and s(x) is that of a 
variable formed by adding 1 to an exponential variate of mean 
Sample from r(x) with probability f and from s(x) with probability i, 
i.e. select a random number u and 

(a) sample from r(x) if w-f < 0 

(b) sample from *y(x) if w-f > 0 

In the former case u has a uniform distribution in [0, f] and in the 
latter case u —J has a uniform distribution in [f, 1], 

In case (a) take x = | u,z — \x 2 . 

In case (b) take x= 1 -1 log (3w-2), z = i(x-2) 2 . 

Select an exponential variate y (mean 1) and accept ^ if y > z. 

2.6 Bivariate Sampling 

We must now turn to the problems of sampling from bivariate 
distributions. This is considerably more difficult and only an outline 
of the procedures that can be used are given. 

The ‘Top Haf Technique is of course available, in which the two 
variates in question are written on each disc. The mechanical extension 
to a two-dimensional tabular array can also be used. The difficulty here 
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is that the number of classes is likely to be large for any finely divided 
distribution and the number of discs or cells required becomes enormous. 

The Conditional Probability Technique . This utilises the fact that we 
may write 

p(x, y ) = p(x)p(y\x) 

where p(x) is the marginal distribution of x and p(y\x) is the con¬ 
ditional distribution of y, given that the other variate has the value x. 
We use one of the preceding techniques to sample the value of x and 
then use this value to select which distribution the second variate y shall 
be selected from. This is really only of value when the conditional 
probability distribution is of the same form for all values of x and the 
sampling may be done from a single distribution and a scale factor and 
shift of origin applied. Otherwise a separate distribution for each 
x value requires approximation and the amount of storage required 
becomes prohibitive. 

The special case of the normal distribution is the most important 
application. Suppose x u x 2 is a sample from a bivariate normal 
distribution with means p i9 p 2 , standard deviations o u o 2 and correla¬ 
tion p. Then the conditional distribution of x 2 given x x is normal with 

mean \i 2 + px t and standard deviation V1 - p 2 o 2 . 

Thus the sample procedure consists of taking two random variates 
yi and y 2 , say, from the normal distribution with mean zero and 
standard deviation 1, and forming 

Xi = AH + tfiTi 

X 2 = p 2 + px 1 4- Vl — P 2 ^2T2 

2.7 Multivariate Sampling 

The difficulties of multivariate sampling are similar to those of 
bivariate sampling with the difficulties increased many-fold. The 
principal difficulty in empirical studies will be to obtain an adequate 
quantity of data to represent the multivariate distribution adequately. 
The ‘Top Hat’ technique or its mechanical version can be used as 
before, but the number of discs or cells will rise astronomically. 

In theory, the conditional probability technique is still applicable 
by writing the multivariate distribution 

p(x u x 2 , x 3 , x A , pix^pix^x^pix^x^) ... 
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but it is very rare that it is possible to describe in adequately compact 
form the various conditional distributions involved. 

In the case of a multivariate normal distribution, an alternative 
method is available. In this case, for zero means the distribution is 
completely described by its variance matrix, a matrix whose elements 
describe the variances and co-variances of the different variates. We 
may then proceed as follows: 

If the variance matrix is Q, this is known to be positive definite and 
there exists a non-singular triangular matrix T such that 

q -1 = r r 

where F' is the transpose of T 

Hence MfT V = (xT)(rx)' 

and thus the variables y — T'x are independent normally distributed 
with unit variances. 

Thus the required variate may be obtained from the relation 


* = r~ l y 

This may be applied to the two-dimensional case: 




Hence 


Q - lss _ 1^ 


~P<>1^2~\ 


~G 2 2 


L P a l a 2> °2j (ClOlYil—P 2 )\_~ pGiG 2 > ^lj 

r ^ 1 K 1 f-i = ff iVi-p 2 , o~j 

Xj == ^(Vl —p 2 yi +py 2 ) 


x 2 = a 2 y 2 

which is substantially equivalent to the formula of the preceding 
section. 


2.8 Sampling under Dependence 

The foregoing sections have discussed the drawing of independent 
samples from various distributions. In some processes, however, the 
distribution from which a sample is drawn depends on the preceding 
samples. This can be regarded as a special case of drawing from a 
multi-variate distribution with the difficulty that, in general, the 
dimensionality of the distributions increases as the sample is gathered. 
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This complicated situation is one in which the practical difficulties of 
gathering sufficient information to describe this sequence of distribution 
functions often makes its simulation impossible. However, in certain 
theoretical studies, it is often assumed that certain simple forms of the 
dependence of one distribution on the preceding values hold. The most 
useful case concerned is the Markov process, where the sequence of 
values v 0 , x u v 2 , v 3 , ... are generated by the following rule: 

x r+l = ax r + s r+i r- 1,2,3,... (A) 

where the e r are independent sample values from some given distribu¬ 
tion. In this case, and its multivariate extension where 

m 

X,+ |- I «A-. = «- + i 

s = 0 

can be dealt with by successively sampling from the distribution e r and 
substituting in the equations (A) or (B). 

If this model is thought to be adequate to describe the dependence 
of any empirical data, the coefficients a s of equation (B) can be esti¬ 
mated by an extension of the methods of least squares. This technique 
is often used to describe the variation of the economic variables. 


CHAPTER 3 


MORE SAMPLING METHODS 


The methods outlined in the last chapter have been of general 
application, and quite independent of the distribution in question. 
However, it is possible to use alternative methods for some theoretical 
distributions and these are discussed briefly below. 


3.1 Normal Distribution 


A good approximation to a normal random deviate can be obtained 
by the use of the central limit theorem. This states that the mean of n 
random variables each from the same distribution will, in the limit as 
n tends to infinity, have a normal distribution. The standard deviation 

will be proportional to -i=. Thus if n uniform variates (random numbers 
V n 

treated as fractions) are added together, the resulting variate has an 
approximate normal distribution with mean \n and standard deviation 
Vw/12. In practice, reasonable approximations can be found with 
values of n as low as 10. 

Since it is possible to find theoretically the convolution of two 
uniform variates, the number of items to be added may be halved by 
sampling from that convolution. This has a triangular distribution 
with a distribution function given by 


P(x) = 


2x 2 
l — 2x 2 


x^i 


and thus, if the random number is P, the required transformation used 
to pick a variable from the triangular distribution is 

* = Vi? p § i * = V«i -p) 

The relative merit of these two alternative methods of generating a 
normal variate depends upon the difficulty of picking a random 
number and that of taking the square root. Thus, if this is being per¬ 
formed on an automatic digital computer, the simpler process picking 
more random numbers will usually be quicker, 
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Teichroew has extended this technique by considering polynomial 
functions in a random variable 9 

y=t 

r=l 

m 

e = I 

£=1 

f t being uniformly distributed. He considers the number of terms to be 
used in forming 9 and the optimum values of the coefficients a r and has 
developed some powerful formulae for the efficient sampling of the 
normal distribution. 

For example, if the normal distribution is truncated with a tail 
probability of 10“ 5 an adequate fit is given by m = 8, n — 13 and 
a 2r = 0. The values of a 2r +1 are given in the following table 


i a i 

1 4-1082897 

3 0-1491695 

5 0-0214077 

7 0-0041338 

9 0-0008779 

11 0-0001639 

13 0-0000043 


An interesting approximation for the distribution is given by Kahn. 
This derives from the approximation 


-W 


2 e 


-kx 


x > 0 


k = 


(:l + e ~ kx ) 2 

It is not clear how this remarkable approximation was discovered, 
but it does give an adequate representation. 

The c.d.f. corresponding to this p.d.f. is 


*x 

J 0 


2k e 


— kx 


c l + e ~ kx ) 2 
Thus the inverse c.d.f. is 


dx — 2 


dy 


(X+yY 


- r 2 T _ 

_1 + T_ i 1 


+ e 


-kx 


1 


1, 1 + y 

7 °g« "j ~ 
k 1 — y 


A random sign is attached to this variable where y is a uniform random 
variable in [0, 11. 
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Another approximation is given by Hastings. Put 

n = {~2 log (1 — y)}^ 

Then a standardised normal deviate is approximately 

X = t)~ _ a ° + a ^ 

bo + bi n + b 2 n 2 

where 

a 0 = 2-30753 b 0 = 1-00000 
fli = 0-27061 = 0-99229 

b 2 = 0-04481 

This applies if y S 0-5. If y < 0-5, replace y by 1 -y and change 
the sign of x. 

A powerful method recently developed by Box and Muller generates 
pairs of normal deviates. 

The joint distribution of two independent standardised normal 
deviates x u x 2 is 

p(x I, X 2 ) = — 

2n 

Consider the transformation 


x x = r cos 0 
x 2 = r sin 0 

The general distribution of r, 6 is 

p(i\ 6) dr dQ = -- e~ r2 r dr dO 
2 n 

d(ir 2 )x—dO 

2 n 

Thus r 2 and d are independently distributed: 6 has a uniform distribu¬ 
tion in (0, 2n) and r 2 has an exponential distribution. 

Given two uniform random variables U u U 2 

R = V-log e U, 

9 = 2iiU 2 

A.S.— 2* 
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are distributed as r and 6 and hence 



are exact independent normal deviates. 

The value of this method depends on the facilities and speed of calcu¬ 
lating logarithms and the trigonometric functions. 

The usual trigonometric function routines usually work faster for 
small angles and a single deviate can be obtained by assigning a random 
sign to 

X' = /log e — sin -U 2 
V U i 2 


3.2 The x 2 or Gamma Distribution 

The samples from this distribution may be obtained by using the 
fact that x 2 on n degrees of freedom is the convolution of n variables 
distributed as x 2 on one degree of freedom, which is in turn merely the 
square of a normal random deviate. Thus, to pick a sample value from 
X 2 on r degrees of freedom, r random normal deviates are selected, 
squared and summed. 

It is often necessary to sample from x 2 on n degrees of freedom 
where n is not fixed on successive occasions, but is determined either 
by a random process or by other considerations. In this case, the method 
outlined above is still available, but can be shortened if the lower 
bound to the value of n to appear is known. Suppose this is n 0 then 
we may take a sample from x 2 011 n o degrees of freedom using the 
transformation method and pick m — n — n 0 normal deviates x, x 2 , 
..., x m and form 

m 

Xn Xn 0 2+ X! 

i= 1 

Naturally n 0 must be a lower bound, because if otherwise, m will be 
negative. It is not possible to obtain the required distribution by sub¬ 
tracting the squares of random normal deviates since 
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This is easily seen, since the only quadratic forms of normal variates 
that have * distribution have matrices with idempotent eigenvalues. 

A quicker method hinges on the fact that an exponential distribution 
is x on 2 degrees of freedom. 

Thus for even degrees of freedom (n ~ 2p) we take 

xi =tu t 

i — 1 

where the £7. are exponentially distributed. 

For odd degrees of freedom n = 2p + 1 we take 


xl= s U t +z 2 
1=1 

where Z is normally distributed. 


3.3 Exponential Distribution 

It has already been mentioned that this distribution may be found 
merely by taking a scaled value of the logarithm of a random number. 

Von Neumann has suggested a rejection technique for selecting from 
an exponential distribution. 

Choose two random numbers R 0 , R v 


If R t s R 0 score 1 and choose a new random number R 


*3. 


If R 2 ^ score 1 

Continue in this way until the test R t <; R._ t fails. 

The joint distribution of the score / and the variable R 0 is obtained 
as follows. 

If exactly i tests are made R 0 ^ ^ R 2 

probability of this can be evaluated as 


• ■ ^ Ri < R i+1 and the 


K +1 

il (i+ 1)! 

The probability of / = / is_ ^ 1 

O'+l)! (/ + 2)1 

For given R 0 the probability of /even is 




nr 


R 0 
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The conditional probability of R 0 for i even is 

. _= (1 —e -1 ) -1 e~ Ro 

Thus if the sequence of tests is repeated until failure occurs on an 
even i the initial random number has been selected from the truncated 

distribution (1 e x . . 

The number of sequences n tried before success has a distribution 

(1 —e -1 ) e~ n . 

The variables n and R are independently distributed so the p.d.f. of 
x = n+R is given by e ~ x . 


3.4 The Poisson Distribution 

If the intervals between events are exponentially distributed, it is 
well known that the number of events in a fixed period of time has the 
Poisson distribution, and this fact may be used to generate such a 
distribution. The Poisson distribution, of course, is a frequently 
occurring example of a discrete distribution and is given by 

P(x=n) = — e~ l n = 0,1,2,... 


We construct a sequence y x ,yi,y% of variables with an exponential 
distribution and mean - and form their cumulative sum. Let n be 

A 


defined by 


« n+t 

t y t £ A < E ^ 

i=i i= l 


Then n will have a Poisson distribution with mean X. Of course, if A is 
large, the normal approximation can be used.. 

An alternative technique based on rejection is also available. 
Consider the distribution of the product of two uniform random 

variables. 

__ x 

For fixed R 0 Rl ~lT 


p(x) = -log x 
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By induction the product ]^[ R, has p.d.f. 

i = o 

f n (x) = (-logxflnl O^x^l 

= 0 otherwise 

and c.d.f. F„(x) = x £ (-1 ° S x) " . 

■ = o nl 

Take the product U n of n uniform random variables and test 

U n > e~ m 


n m j 

The probability of this is 1 — F n (e m )=l — e m £ — 

; = o ,/! 


The distribution of the first n for which U n > e m is 

n\ 

i.e. n has a Poisson distribution of index m. 


3.5 Other Rejection Techniques 

Kahn lists a number of other distributions which can be sampled by 
rejection techniques. A few examples follow: 

(1) To sample from the p.d.f. given by 


/(*) = 



0 


1 

otherwise 


This is the distribution of cos nR where R is a uniform variate in [0, 1]. 

(i) Select two uniform variates R 0 , R x 

(ii) Reject unless Rl + R± ^ 1 

(iii) Now ( R 0 , R l ) is a random point in the first quadrant of a circle 
and 

arg q = tan -1 ~ is uniform in 0, n 
x 2 


R 2 o-R i 
Rl+Rf 


The required variate is 

cos 2 rj = cos 2 r\ — sin 2 y\ = 



38 


THE ART OF SIMULATION 


In practice, if a pair is rejected, the last value can be retained for the 
next trial, reducing the number of random numbers needed. 

The efficiency of this process is obviously ?. 

(2) The following rejection procedure can be used to generate a 
number of different variates. 

(i) Set i — 0, select a uniform random variable R 0 . 

(ii) Select a random variable R i+ lf increase / by 1. 

(iii) Test if R t R i ^ 1 ; if so repeat stage (ii); 

if not set 

Y = Ri 
z — R 0 
/ = i 

The joint p.d.f.’s for (Z, /), (T, /), (Z, 7) can be obtained as 

x l 

/(*,/) = _— , 0 < x < 1 , i ^ 0 

9(y>i) = ^Z , iso 



That for Z is, of course, merely the original uniform distribution. 

If samples which stop on even values of i are rejected, the marginal 
distributions become 
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fo(x) = -£ 


i .'c 2i+1 x cosh x 


XV (2,.)! 


X 


where K - 1 - e 


-l 


0<x:£l 


3o (j,) = ( sinhl - sinh ^) 

K. 

h 0 (z) = 


e—1 


1 

0 ^ z < 1 


If samples which stop on odd values of i are rejected, the marginal 
distributions are 


f e (x) = ex sinh x , 0 ^ x ^ 1 

g e (x) — e(cosh 1 — cosh y) , 0 ^ y ^ 1 
K(x) = e-e z , O^z^l 

Replacing the test R t ^ R^, by R i+i ^ R t replaces the variables 
x, y, z by their complements. This enables a further set of distributions 
to be generated. 

A further set can be generated by replacing the test by R t < R 0 and 
yet another by (i + 1)jR* < iR 0 . For further details Kahn should be 
consulted. 

The whole process is an extension of the Von Neumann technique 
for generating an exponentially distributed variable. 


3.6 The Binomial Distribution 

This distribution is the most frequently required discrete distribution, 
and there are several special methods of generating variables from this. 
For the special case of P = i the digits of a uniform random variable 
are variables which take the values 0 and 1 with equal probabilities 
Thus the binomial variable of index n with P = \ can be formed by 
summing n digits of a uniform random variable. 

For the case of P # \ this method can be extended and is best 
explained by an example. 

Suppose we require a sample from a binomial distribution with 
P = h then take two random variables x l9 x 2 and express them in the 
binary scale, e.g. 

x t = •11010111001011001 
x 2 = -10101101100110101 
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If we now form a new number by a process of digitwise logical multi¬ 
plication of these two numbers we obtain the number 

*10000101000010001 

The occurrence of a 1 will only arise with probability of Hence 
the sum of n digits from the collation of two uniform random variables 
gives the required random variable from a binomial distribution of 
index n and P = i If a value of P = f is required, then we complement 
the number given by the collation, and sum n digits of this number. 

By a suitable sequence of collations and complementations we can 
obtain a sequence of digits in which the probability of a 1 may have any 
required value, expressed as a binary fraction. Thus to obtain a 
sequence with p — we take four random numbers, collate the first 
pair and complement giving a sequence with p — J, collate the second 
pair giving a sequence with p = ^ and collate the two results. 

The advantage of this technique is that several random variates 
with the required distribution can be obtained from a single random 
number or group of random numbers. 

A similar device has been described by Davis as follows. Treat the 
random number x as a fraction. 

If x < p, score 1 and form x 2 = x x . 

If x ^ p, score 0 and form x 2 = xq -p. 

The probability of scoring 1 is p and x: 2 is uniform in range (0, p) 
or (0, q). We could rescale x 2 to be uniform in 0, 1 and repeat, but the 
division can be avoided by rescaling the probability p to p or pq 
(q=l-p). 

Thus a repetitive process can be defined to give a sum ot scores. 

x x — x 
Pi=P 

If Xi < p h score 1 and form x i + 1 = x t 

Pi +1 = PPt 

If Xi ^ p h score 0 and formx i+1 = Xi-pi 

Pt+i = QPi 

This process may be continued until the successive multiplications have 
reduced the size of x to too few significant digits for the comparison 
to be accurate. 
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3.7 Selecting a Point at Random in an w-dimcnsional Volume 

If a geometric problem is being studied by simulation techniques the 
need to select a point at random in w-space is required. If the space is an 
n-dimensional cube of unit size the problem is trivial since the point 
Ui, x 2 , • •has the property that x 2 , .--,x are independent 
uniform variates. 

To select a random from any other bounded space V the rejection 
process can be used. Surmount the space by a larger cube, sample 
from this enclosing space and test if point is in F, if not reject it. The 
first accepted point is a randomly chosen point in V. 

In two and three dimensions, this process is quite satisfactory, but 
in higher dimensions the rejection rate becomes very high. For example, 
if V is an ^-dimensional sphere of unit radius the probability of accep¬ 
tance of a point randomly chosen in the hyper-cube — 1 ^ x t ^ 1 


(/ = 1, 2, ..., n) is 


r nt 2 


n2"“T(n/2) \4n 


Tie 


n/2 


and decreases alarmingly as n reaches values of 10-20. The rejection 
process then becomes impracticable. 

An alternative procedure consists of choosing a radius x (^1) and 
selecting a point on the surface of the hyper-sphere of radius x at 
random. Since the surface area of a hyper-sphere of radius v is 


27i' I/ V~ 1 

T(nl2f 


x must be sampled from a distribution with p.d.f. 

f(x) = nx n ~ l , 0 ^ .v s 1 

= 0 , otherwise 

The c.d.f. is F(x) = / , Ogx^l 
= 0 , otherwise 

and the usual inverse process gives 

x = \jy 

where y is a uniform variate. 

To pick a point at random on the surface of a hyper-sphere requires 
the selection of a direction at random. Pick x 2 > ...,x n from a 
normal distribution. 










CHAPTER 4 


RANDOM NUMBER TABLES 

4.1 Introduction and History 

In Chapter 2 we reduced the problem of sampling from a distribution 
to one of sampling from a uniform distribution (or if one is working 
arithmetically to picking an integer at random from a consecutive 
range). When sampling experiments are being performed manually, the 
main technique for ensuring that the numbers used are random consists 
of generating a set of numbers by some random process, tabulating 
them and using the tables on the various successive jobs of sampling 
undertaken. 

The modern history of sampling experiments in mathematical 
statistics started with the work of W. E. Gossett writing under the 
pseudonym of ‘Student’ from Guinness’s Brewery in Belfast. He dis¬ 
covered the celebrated ^-distribution named after him, and at the time 
he published the theoretical derivation of the distribution of this 
statistic, he backed his findings by sampling experiments. 

This technique of discovering or verifying the sampling distribution 
of interesting statistics was taken up in a vigorous fashion by the 
statistics school at the University College under Karl Pearson, and for 
several years a steady stream of papers was published giving the 
results of such experiments. In all these, the sampling was performed by 
the traditional ‘Top-Hat’ method using beads or discs in an urn or bag. 
The idea of using tables of random numbers was introduced by Tippett, 
when working in their laboratory on the distribution of the range of a 
sample from a normal distribution. To perform this work on the range, 
Tippett constructed a table of 10,400 random digits which he extracted 
as the terminal digits of entries in some census tables. 

This gave a new impetus to sampling experiments, which could now 
be performed much more quickly and with greater reliability than 
before. It soon became clear that the 10,400 numbers in Tippett’s 
tables were not sufficient, and M. G. Kendall and B. Babington-Smith, 
working in co-operation with University College, decided to form a far 
more extensive table. They produced a table of 100,000 decimal digits, 
using a mechanical randomiser to generate them. This will be described 
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in the next chapter. At about the same time, Fisher and Yates, working 
from Rothamsted, produced a smaller table of 15,000 digits which were 
in fact the 15th to the 19th digits of the logarithms given in A. J. 
Thompson’s mammoth Logarithmica Britannica . 

This set of tables sufficed for sampling experiments for many years, 
and it was not until after the Second World War that any significant 
addition to the stock of random number tables was made. The Rand 
Corporation in America produced 1,000,000 random digits, using an 
electronic randomiser with its own printing device. This machine will 
also be described in the next chapter. 

4.2 Tests of Random Number Tables 

The authors of all these tables held some doubts of the validity of 
their random sources and consequently determined to make some tests 
that these tables actually constituted a sample of random numbers. 
Kendall and Babington-Smith, in the paper in which they described their 
tables, also give an account of the tests which they applied to their 
tables and these have become, through the years, the standard tests 
applied for randomness. Kendall and Babington-Smith emphasised that 
the number of tests for randomness that can be applied to a sequence of 
digits is quite unlimited except by human ingenuity, and implied that it 
will always be possible to invent some test that a given table will fail, 
and in their paper they give an interesting philosophical discussion of 
the implications of such tests. Broadly speaking, their conclusions are 
that the tests of randomness applied to a sequence will depend upon the 
use to which this sequence is to be put. If it can be established that a 
particular type of patterning in the table is likely to give serious error to 
the results of some sampling experiment, then the table must be tested 
so that this particular type of patterning does not exist. This tailor-made 
approach, of course, is not very realistic in practice, and the custom has 
arisen of accepting the tests suggested by Kendall and Babington-Smith 
for nearly all purposes. These tests are now described. 

4.3 Frequency Test 

The first obvious requirement of a set of decimal digits is that each 
decimal digit shall occur with approximately equal frequency, and this 
test merely consists of recording the frequency of occurrence in the 
table of each decimal digit and comparing with its expected frequency 
(one-tenth the sampled size) by using a % 2 test. 
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4.4 The Serial Test 

The frequency test merely tests the probability of occurrence of each 
digit in a given position, but it does not exclude the possibility of serial 
correlation between the digits in successive positions. Thus the sequence 
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... will satisfy the 
frequency test, but clearly it is not a random sequence. As a safeguard 
against serial correlations between successive digits, Kendall and 
Babington-Smith applied a test in which the frequencies of every pair of 
digits were recorded and tested against expected value. In this test, each 
digit is used twice, first as a leading digit, second as a trailing digit, and 
a comparison with the actual frequencies with the expected are once 
again made with a y 2 test. 

4.5 The Poker Test 

The serial test examines first order correlations only and for higher 
order serial correlations some more comprehensive test is required. 
This could be achieved by recording the frequencies of triads of digits 
compared with expectation and then with quadruples and quintuples. 
However, for large groups of this kind, the number of values contribut¬ 
ing to the contingency table will be severely restricted, and the numbers 
in each class may become rather small to the consequent detriment of 
the effectiveness or the validity of the y 2 test. To overcome this, Kendall 
and Babington-Smith proposed to amalgamate certain of the five digit 
groups and make a test of these composite groups. The groups chosen 
were selected according to the principles of poker. The groups were 
those in which 

(i) all digits were distinct; 

(ii) a pair of like digits occurred; 

(iii) two pairs of like digits occurred; 

(iv) three like digits occurred; 

(v) four like digits occurred; and 

(vi) all digits were alike. 

It is a simple exercise in combinatorial arithmetic to work out the 
expected frequencies in these classes and apply the y 2 test for goodness 
of fit of the actual distribution to the theoretical one. 

Because of this similarity between the composite groups and those in 
the card game of poker, it is known as the poker test. In this test, the 
whole sample n is divided into n/5 samples of quintuplets so that the 
problems of dependence between the different groups is eliminated. 
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4.6 The Gap Test 

The final test as suggested by Kendall and Babington-Smith was that 
the number of digits between successive zeros should be counted and the 
frequency of the various ‘gaps’ be compared with the expected values 
given by the ordinary geometric series. 

Kendall and Babington-Smith applied the y 2 method of testing good¬ 
ness of fit in all these cases, but it has been pointed out by Good that 
this is not valid for the serial test, as the use of each digit twice intro¬ 
duces a bias. He has shown that the expected value of the y 2 statistic 
calculated in this test will be 99, whereas, bearing in mind that there are 
10 linear restraints, it should have an expected value of 90. This diffi¬ 
culty can be circumvented at the expense of the stringency of the test 
by taking the digits in n pairs, and therefore using each digit once only. 
Although the practical implications of Good’s analysis are not very 
serious for the tests of decimal digits, they become very much more 
serious in the case of binary digits where there are only four classes in 
the serial test (00, 01, 10 and 11) and the y 2 calculated would have an 
expected value of 3 as opposed to the presumed value of 2. However, 
if tables of binary values were produced, or if the sequences of binary 
values produced by digital computers were to be tested, the volume of 
digits available is likely to be much higher, and then a test on the 
relative frequencies of all possible quintuplets could be made, dividing 
the sequence up into independent 5-bit parts. 

Other tests have been described from time to time in literature, and a 
short account of these is now given. 

4.7 The Run Test 

Kermack and McKendrick have suggested a test of random digits 
using a run property. They defined a run as a sequence of digits either 
in ascending or descending order, and the number of digits in a run is 
termed the length of the run. A descending run must, by definition, be 
followed by an ascending run and vice versa, and the length of the run 
down and up (or up and down) is termed a gap. The derivation of the 
distribution of the run and gap length is rather difficult, and in their 
original paper, they restrict themselves to the problem of evaluating 
this distribution for a randomly arranged set of unequal numbers. In 
the case of a table of random digits, of course, each digit may appear 
many times, and the possibility of ties in the sequence considerably 
complicates the analysis. For this reason, presumably, this test has 
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never received much attention, and is not today used for the testing of 
random sequences. 

4.8 Yule’s Test 

Yule, in using Tippett’s tables, noticed a certain type of patchiness 
in the digits which he suspected affected his results, and in consequence 
he applied a test of his own invention. This consisted of examining the 
actual distribution of the sums of five digits with its expected distribu¬ 
tion. He chose the five elements to be added by reading down the 
columns of the table, and showed that certain batches of Tippett’s 
tables did not satisfy this test of randomness. The theoretical distribu¬ 
tion of the sums of five equally likely digits presents some little theoreti¬ 
cal difficulty, which Yule overcame by a practical arithmetic process. 
He derived it in five stages: first finding the distribution of the sum of 
two digits, then the distribution of the sum of two digits, and one digit 
and so on. ’ 

4.9 The D 2 Test 

Random numbers are now often used for the evaluation of multi¬ 
dimensional integrals. This is regarded as a generalisation of the problem 
of evaluating an area. This uses random numbers to select a point at 
random in a unit square surrounding the area. When the numbers are 
to be used in this way, it would be sensible to test that the points so 
derived are uniformly distributed over the square. A test of this feature 
of the sequence has been devised by Gruenberger and Mark that 
consists of using sequences of four random numbers as the two pairs 
of co-ordinates of two points in a square and computing the distance 
between them. The distribution of this distance has a theoretical distri¬ 
bution that can be easily calculated from the formulae 

P(d) = 7rd 2 -M- + fL , 

= i + O “ 2)^ 2 + 4( d 2 - 1)* + }(d 2 ~ if ~ 1 d 4 - 4 d 2 sec“ 1 

1 < d ^ V2 

and, as usual, the actual can be compared with the expected by means of 
a x 2 test. 

4.10 Comments Concerning Tests 

These tests are usually made on a collection of random digits in 
fairly small batches. It is possible then by combining the Rvalue from 
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each of the batches to make an overall test of the whole table. If a 
table is large enough, then we must expect small parts of the table to 
exhibit highly regular patterns as any particular regular pattern must 
occur with exactly the same frequency as any other configuration. 
Thus we must not be surprised if a table of 10 10l ° digits contains at 
least one example of any one sequence of 10 10 digits. In particular, a 
whole series of zeros should occur. Such regular patterns will give high 
l 1 values in some or all of the tests. In fact, if the table is broken into 
small batches and for any given test the y 2 value is calculated for each 
batch, then the set of y 2 values obtained should actually be a sample 
from the appropriate y 1 distribution. Some of the later workers have 
tested their y 2 values from separate batches in this way. Regarded in 
the large, any table of random numbers which does not include its due 
proportion of high y 2 valued batches cannot be regarded as a random 
table. On the other hand, for practical use, that particular batch of the 
table would be quite useless in some small limited experiment. 

Thus the compilers of tables have usually described the tests that have 
been made on them and the results of those tests, and indicate which of the 
various batches in their table fail to meet each of the several tests applied. 

4.11 Tables of Other Random Variates 

Although we have described in some detail in the preceding chapter 
how to convert from random numbers to any other statistical distribu¬ 
tion, certain distributions are of such importance that tables of samples 
from such distributions can save an enormous amount of work in 
sampling experiments. Of paramount importance in statistical theory is 
the normal distribution, and two tables of random normal deviates have 
been published. These were both found by transforming tables of 
random numbers by means of the inverse cumulative formula given in 
Chapter 2. Mahalonobis transformed Tippett’s tables, and Herman 
Wold of Uppsala transformed the Kendall and Babington-Smith tables 
into 25,000 random normal deviates. Both these tables, when con¬ 
structed, were subjected to further tests for their randomness. The test 
used consisted of forming the distributions of the sum of n of these 
random normal deviates, and comparing this distribution with the 
expected normal distribution. 

4.12 Conclusion 

The tables of random numbers described in this chapter were 
historically of the utmost importance in the advancing of the technique 
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of sampling experiments, and are still an invaluable aid to workers who 
wish to make small-scale experiments. However, the advent of electronic 
computers has enabled the scale of, operations to be considerably 
enlarged, and these tables can now no longer be thought of as a good 
source of randomness .for the..enlarge sampling experiments being 
conducted. The next two chapters consider two alternative sources of 
randomness for large-scale experiments. The first of these is the physical 
generation of random digits by some apparatus, and the second i§ the 
production of sequences of digits that, although produced deter¬ 
ministically, will satisfy the various tests for randomness that have 
hitherto been applied to tables. 



CHAPTER 5 


RANDOM NUMBER GENERATORS 

5.1 Introduction 

We have seen that the method of simulation (or sampling experiments) 
requires a source of random digits which can be used singly or in 
collections as approximations to uniform random variates. Chapters 
2 and 3 were concerned with the use of such a source of random 
numbers for the generation of samples from any statistical distribution. 
This chapter will consider methods of deriving such random digits by 
physical means. 

There is a long history of building machines to generate random 
digits. This arose, in the first instance, from the need to study certain 
stochastic phenomena by building physical models of the systems. The 
present-day physical generators of random digits have derived from 
these attempts to build physical random simulators. 

The first and most important machines constructed for physical 
simulation were used by telephone traffic engineers to study the impact 
of random fluctuations of telephone demand on telephone exchanges, 
and the efficiency of various methods of routing calls through the 
available equipment to minimise the delays to those calls. Similar 
machines have been made to study the analogous problem of random 
effects on road traffic congestion. 

A wide variety of natural phenomena have been used to produce 
randomness and can be divided into the electronic and non-electronic 
devices. Amongst the non-electronic or mechanical devices, the most 
popular has been the application of the spinning disc used in many fair¬ 
ground games. A disc, divided into a set of equal numbered sectors, is 
spun and allowed to come to rest. The number of the sector selected by a 
fixed pointer is taken as the random number selected. 

Most electronic devices make use of the small variations in the 
thermionic emission from a valve due to the thermal agitation of the 
electrons in the filament. Another popular source of noise is the 
emission from a radio-active source. 

It has been proposed to use detection apparatus to pick up atmo¬ 
spheric radiation and convert this to a widely fluctuating voltage. 
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Another possibility is to utilise the Brownian movement and detect the 
incidence of a Brownian particle in a small field by means of photo¬ 
electric devices. A machine based on this principle is known to have 
been constructed. 

Controversy has raged about the validity of each of these methods and 
philosophers have argued whether such natural phenomena can be 
regarded as random. For practical purposes these arguments are 
irrelevant, and it seems that the right requirement of a phenomenon to 
be suitable for a random noise source is that there is no scientific 
explanation of the phenomena that is used and the effect that is observed 
is generated by a large number of atomic events. Moreover, these 
atomic events, in spite of their large numbers, must not display any of 
the statistical regularity that is observed as, say, in the gas laws. 

The crucial test that a process is random is that predictions concern¬ 
ing the future behaviour of the system cannot be improved from a 
knowledge of its past behaviour. Ideally, we should require proof that 
no rule for prediction is possible, but since there is no limit to the 
rules that can be tested, and ultimately the proof must depend upon 
experiment, this will not be possible. In practice, then, we are forced to 
accept any phenomena whose exact mechanism is unknown to us and 
whose behaviour is not predictable by any obvious deterministic laws. 

It has already been mentioned that the second major table of random 
numbers was produced by Kendall and Babington-Smith, using a 
randomising device. This was a mechanical disc mechanism. A disc was 
spun by an electric motor and was stopped at an arbitrary time by the 
observer and the digit against the pointer noted for inclusion in the 
table. 

The arbitrary time was selected by blindly drawing an electrode 
across a graphite maze drawn upon a piece of paper until the electrode 
made electrical contact with the maze. 

The principle of this machine is that there will be many revolutions 
of the disc between observations and the only relevant factor in deter¬ 
mining the digit recorded is the magnitude of the fractional part of the 
last revolution made. It is maintained that this fractional part is not 
under the control of the operator and so is equally likely to take any 
value and hence each of the ten sectors are equally likely to be selected. 
Kendall and Babington-Smith tested the tables produced by this 
mechanism by the several tests described in Chapter 4, and indeed (as 
was mentioned there) they were largely responsible for the invention of 
many of those tests. 
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Other mechanical randomisers have been built, but there is very little 
in the literature concerning them. The only exception to this is the 
stochastic analogue machine devised by Beer. This employs the principle 
of buffeting a ball in a random manner before allowing it to fall into 
one of a large number of pockets. This machine will be described in 
more detail later. 

5.2 The Electronic Machines 

These machines are based almost universally on converting the 
source of random noise into a train of pulses. This train of pulses is used 
to drive a cyclic counter. The drive is interrupted at fixed intervals of 
time and its state examined. The state between successive inspections of 
the counter depends on the number of pulses occurring in that interval 
of time; and more particularly on the remainder on dividing that 
number by the radix of the counter. Thus this mechanism is similar to 
that used with the spinning disc except that the roles of randomness and 
determinacy are interchanged. In the disc the inspection occurs at 
random intervals, and the events (change of state) are at constant inter¬ 
vals determined by the constant speed of the motor; in the electronic 
devices the inspection is at constant intervals and the events occur at 
random intervals. 

Cursory consideration of these two systems would lead one to believe 
that they would be equally efficient, but it has been shown by Thompson, 
one of the workers on the G.P.O. machine ERNIE, that this is not the 
case. He has given a mathematical analysis of these two situations, 
which shows that the process of uniform inspection of a randomly 
actuated counter gives rise to a more accurate random distribution of 
the positions of the counter than the other system. 

For those readers who would prefer a heuristic argument rather than 
a more mathematical one, the following may be useful. 

Although the number of pulses between inspections has a discrete 
distribution, if the average number of pulses per sampling period is high 
enough, it can be regarded as a continuous distribution. Divide the 
distribution into vertical strips whose width corresponds to the radix 
of the counter. Then the distribution of relative movement of the 
counter is obtained by superimposing these strips on to each other. 
Each strip has a curved top, but for reasonably symmetric distributions, 
each strip may be mated with a symmetrically disposed strip and so the 
top of the summed strips is level, i.e. the distribution is uniform. 

The wider the original distribution, the flatter the tops of the larger 
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strips are and the less levelling out is required by the summing process. 
Thus high variability is a desirable feature of the distribution, and this is 
the reason for favouring a Poisson source of pulses. 

If the successive movements are uniformly distributed, then the 
successive counter positions are uniformly distributed. To ensure that 
the positions are statistically independent, the number of pulses in 
successive intervals must be independently distributed. 

To ensure this, we require that the probability of generating a pulse 
at any instant does not depend on the history of the previous pulses 
generated. The random noise source from which pulses are generated 
can be analysed by means of auto-correlation functions, and it is fairly 
easy to see that for no dependence of one pulse train on another we 
require the auto-correlation coefficient for lags greater than or equal to 
the sampling interval to be zero. This requirement is met if the sampling 
interval is chosen large enough to exceed the maximum lag for which 
the auto-correlation of the originating noise source is significant. 

To obtain a pulse train from the noise source, this is fed into a high- 
gain amplifier whose output is limited both ends, thereby producing an 
irregular square wave form. This square wave form is differentiated so 
that each swing of the amplified noise generates a pulse. 

In practice, the differentiator in such a pulse-generating circuit will 
have a dead-time associated with it so that after generating a pulse it 
cannot produce another one for a certain interval of time. Extensive 
mathematical analyses have been made of the performance of systems 
involving dead-times in connection with counting the radio-active 
particles and the use of Geiger counters as measures of the strength of 
radio-activity. There is no need to go into the details of these analyses, 
since, for the purposes of constructing a random number generator, we 
rely upon the production of a long train of irregularly spaced pulses and 
counting them. The exact distribution of the intervals between them is 
now no longer of significance. 

We now describe, in more detail, some of the actual machines that 
have been constructed. 

5.3 ERNIE 

This machine has obtained popular notice for its use in connection 
with the national lottery system recently introduced in this country. 
The randomising technique used is precisely the one described above 
and uses neon discharge tubes as the sources of noise. Each time the 
noise wave forms pass up above a certain fixed level, the pulse generators 




54 THE ART OF SIMULATION 

emit a short pulse of two microseconds duration, followed by a dead¬ 
time of 30 microseconds. The cyclic counters used have 10 or 24 
positions, the latter being used to select a random letter and the former 
for random decimal digits. The counters are inspected at intervals of 
about one-sixth of a second to produce the random digits. 

The output from these generators should all be independent of each 
other, which implies that the noise wave forms from the neons are 
independent, and it is thus important to ensure that there are no 
common effects caused by fluctuating anode voltages. 

For a machine used for the purposes of ERNIE, it is most important 
that no fault should develop during the selection of bond numbers, 
and for this purpose a redundancy technique has been used in the 
generation of the random digits. Nine random numbers are required, 
but ten cyclic counters are provided. These are connected in pairs, as is 
shown in Figure 6. 

The final outputs are taken as the differences of each pair of genera¬ 
tors. This has the effect of increasing the variability of the number of 
pulses counted, and hence leads to an improvement in the approach to 
equi-probability. 

Its main purpose, however, is to deal with the possibility of the 
breakdown of one of the original generators. If, say, U2 breaks down, 
then VI = Ul-K, where K is the constant value from the broken 
generator. Thus, if Ul is uniformly distributed, so is VI. Moreover, if 
ATs not fixed, but has any distribution whatsoever, it is still true that VI 
has a uniform distribution. Thus the nine outputs of the machine are 
safeguarded against the breakdown of any one of the ten generators. 

The arrangement preserves the independence of the separate outputs 
since if the ten cyclic counter outputs are r 2 , . • •> ?io the final 
outputs are s l9 s 2 , .. .,s 9 , we have 

Pr(Si = k u s i+x = k 2 ) = Pr(r { —r i+1 = k u r i+i 
= Pr(r f = ki + r i+1 , r {+2 
= Pr(r t = k' u r i+2 = k f 2 ) 

= Pr(Si = k 1 )Pr(s i+1 = k 2 ) 

This principle could be extended to safeguard against more than one 
breakdown, and the interested reader is referred to the discussion on 
Thompson’s paper for an analysis of the methods by which this can be 
achieved. 


-r i+2 = k 2 ) 
= r i+1 -k 2 ) 
= 10" 2 



random number generators 


55 


1 


2 


3 


4 


5 


6 


7 


8 


9 


Output 

Registers 

















8 


8 



^— 

Cyclic 

Counters 




10 


<- 


Pulse 

Generators 


10 


Random 

Sources 


Figure 6 


This machine is not typical of random number generators in that its 
output has a special application, and therefore a special form. It was 
developed from the traffic machines built by the G.P.O., which em¬ 
ployed the same physical means of generation. 

5.4 The Rand Corporation Machine 

The Rand Corporation of America have recently published a table 
of one million random digits, which were produced on an electronic 
randomiser. This machine was of standard pattern and generated 
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decimal digits at the rate of one per second and printed the value, 

obtained. 

5.5 The Manchester Mark I Computer Randomising Unit 

The first automatic computer built by Ferranti Limited (and designed 
by the Department of Electrical Engineering of the University of 
Manchester) incorporated a random number generator which used the 
standard principle. This machine worked in the binary scale and so had 
a binary counter (more usually known as a flip-flop ) having only two 
states. This was continuously complemented by a train of pulses 
generated from a noise source, which, in this case, was the thermal 
agitation in the emission from a thermionic diode. The state of the flip- 
flop was inspected at times governed by the master clock of the machine, 
and the resulting digits collected into a register to form a random binary 
number. No details of this mechanism nor any descriptions of simula¬ 
tion experiments on the Manchester Mark I machine have, using this 
randomiser, ever been published to the author s knowledge. Further 
Ferranti machines have not incorporated such random number genera¬ 
tors. The reader may draw his own conclusions. 

5.6 The United Steel Companies Random Number Generator 

This machine is a fast generator which has been built for experiments 
on cybernetic techniques of control of general systems. It uses the 
standard principles for the generation of random digits and, like the 
Manchester Mark I computer, works in the binary scale. The irregular 
square wave formed from the noise in a thermionic diode is fed directly 
to a binary counter, which is complemented by each rising edge of the 
wave form. 

As in ERNIE, a comparison technique is used, but instead of using 
two random sources with identical characteristics, the same random 
number generator is used twice over. This is effected by supplying two 
binary counters, and a control mechanism which goes through the 
following cycle. First a gate 1 is opened, allowing the pulses to feed 
into counter 1, then gate 1 is shut and gate 2 opened, allowing the 
pulses to feed into counter 2, gate 2 is shut and an inspection gate 3, 
leading to an anti-equivalent circuit between states of counters 1 and 2, 
is opened. If these digits are unequal, the digit is recorded in a shifting 
register as that of the state of counter 1. If they are equal, no digit is 
recorded and the process is repeated. Figure 7 illustrates the process. 

This rejection process is more effective in giving rise to equi-probable 
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distribution than the mere modulo reduction technique used in ERNIE. 
It only requires, for the production of digits, each with a probability 
i, that the two noise sources have exactly the same statistical behaviour 
and this is automatically guaranteed. This follows, since if the prob¬ 
ability of a 1 from this noise source is p, the probability of an output is 
%pq and the probability that this is a 1 is pq. It has the disadvanlage that 
the production of digits is irregular and proceeds at about an average 
of half the rate if the rejection process were not used. However, since a 



Figure 7 


sampling rate at about 20,000 inspections per second is required, the 
number of pulses or their equivalent which can be counted into the 
flip-flop is much less by a factor of about 20 than that used in ERNIE. 
The overall performance of the two machines is probably rather 
similar. 

A mathematical analysis of the rejection techniques shows that not 
only is the correct probability of each digit produced, but that any 
residual correlation between successive digits is also reduced. The 
rejection process has the same advantage as the combination process in 
ERNIE in that if one part of the mechanism corresponding to one 
generator fails, the mechanism still produces random digits of quality 
determined by the elementary process. As in the Manchester Mark I 
machine, these digits are collected in a register and are available to an 
external source in groups of 15. 

The demonstration that the effect of correlation in the noise sources 
is reduced in the rejection process requires some assumption of the 
type of dependence. First suppose that the digits from each source form 
a Markov chain in which the probabilities of obtaining a 1 after a 0, 
A.S.— 3 
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or 1, are p 0 , or p l9 respectively. The resulting sequence is also clearly 
Markovian and it only remains to determine the two probabilities 
corresponding to p 0 and p x . 

Denote the pairs 00, 01, 10, 11 by O', 0, 1, 1' and the probabilities 
that a sequence of n new symbols terminating with the symbol 6 with 
all other symbols 0' or 1' shall follow an old symbol r, by p^J. 
The usual difference equation technique gives 

p(;o = Aa n + Bp n 

where a and p are the roots of 


(x-q 2 0 )(x-pl)-p 0 qoPiqi = 0) , (<?, = 1 ~Pr) 


and A and B are constants determined by the initial conditions. 

The probabilities that a 1 shall follow the symbol r in the derived 
sequence are 


Aa Bp = A<x + Bp-(A + B)ocp 


(s + r = 1) 


Substituting for Aoc + Bp and A + B from the initial conditions we 
obtain 


oo 



n=l 


1 

l + 3o + Pi 


CO 



qp+pi 
1 +^ 0 + Fi 


Write 

then 


Pi = P + 8 » Po — p —8 » $ = i(Pi—Po) 

p<» = i S (1) = 8/2(l + 8) 


S measures the departure from independence of the digits of the se¬ 
quences and so the pairing process has reduced the dependence. If 
the process is repeated n times we obtain 




5 

2 "[l + (2-i" +1 >S] 


It is well known, or easily established, that the probabilities of a 
Markov chain of n symbols ending in i are 


</>‘ 0) = — 1 —[<Jo + Po(Pi - Po)" ‘] 

Po + Qo 

= — 1 —[po-Po(p.-Por‘] 

Pa + Qo 



aliilllliliL 11 IldlJiikllh Mill IllJli-. Ill lb 1Lf! i Mill Iftkjlilillliii!!JiilillJIli 1 ^ Lu I! *\ 
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Thus the probability of an indefinitely long derived sequence ending 
in a 1 is 

This observation shows how the dependence between the digits of the 
sequence can be removed entirely after one stage of pairing. Suppose 
the symbol following each 0 of the derived sequence is changed into its 
opposite. Then in the Markov scheme pf," is changed into q ( 0 lf . But 
?o = (<7o + Pi)/0 +do + Pi) = p\ i} . Hence the resulting sequence rs an 
independent sequence. A second pairing will produce a random process 
Next consider a sequence subject to a trend, that is the value of p is 
steadily increasing or decreasing as the trial proceeds. Let p denote the 

probability of a 1 at the rth trial. Then assuming a linear trend for 
simplicity, put 

Pr — p( 1 + a r) a p cur 1 

Then the probabilities that the rth pair of digits are 01 and 10 are 

pq+ap 2 + 2').p(q — 1 — ap)r — 4ct 2 p 2 r 2 
and 


pq - apq + 2 ap(q - 1 - ap)r - 4v 2 p 2 r 2 

and the conditional probability that the rth pair of digits will be 
recorded in the new sequence as a 1 is 


pq(l~cc) 


---—. l + _ - * - r 

2pq + ccp(p~q)l q(l-oi)(2q + aq-p) 


2a 2 (q — p + oip) 


] 


a i- 


2 q 


'l+^izP), 




The average number of symbol pairs retained from n digits is of order 
npq so that the new trend equation is approximately 



<* 2 (g-p) 
2 pq 2 


A repetition of the process gives 



P < 2 >=Ji-?!iiz^iri + M£ Z rt_ 2 r i 
1 2 pq JL pV J 

Thus the pairing process applied repetitively will rapidly reduce trend 
The pairing process is rather wasteful of digits. Thus, if the sequence 
happens to be random, on average half the digits are discarded and of 
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those that are retained each pair gives only a single symbol. The final 
sequence contains only 25% of the digits of the original sequence. 
In general, it is the percentage of the information retained in the final 
sequence that is required. Table 5 shows the quantity for^ = 0T(0T)0*5. 
The calculations leading to this table have neglected an amount of 
information contained in the variation of sequence length, but this can 
be shown to increase as the logarithm of the sequence length, whereas 
the main contribution increases directly as the sequence length. 

This loss of information arises from discarding some of the symbol 
pairs; we can recover some of this lost information. If the symbol pair 
00 is coded as 0 and the pair 11 as 1 in an auxiliary sequence, this will 
be similar to the original sequence with p replaced by p z /{p 2j rq 2 ). An 
application of the pairing process to this sequence will produce further 
random digits. Repeat these two transformations on the second set of 
discarded digit pairs and continue in this way indefinitely. The third 
column of Table 5 gives the total percentage of information retained if 
this recovery process is used. The efficiency of the process is not much 
improved. The remaining loss of information arises from the irreversible 
process involved in transforming the original sequence into the two 
sequences of accepted and rejected digits. 


. 

Table 5 



Percentage of 

Percentage of 


information in 

information 

p 

derived sequence 

after recovery 

01 

0*19190 

0*20250 

0-2 

0*22163 

0*24851 

0*3 

0*23829 

0*28532 

0-4 

0*24718 

0*31584 

0*5 

0*25000 

0*33333 


By using more complicated coding of the digits more information 
can be saved. As an example, suppose the digits of the sequence are 
taken in sets of 4, and recoded according to the following table. 


Table 6 


Old 

New 

Old 

New 

Old 

New 

Old 

New 

0000 

Reject 

0100 

10 

1000 

11 

1100 

Reject 

0001 

00 

0101 

01 

1001 

11 

1101 

10 

0010 

01 

0110 

10 

1010 

Reject 

1110 

11 

0011 

00 

0111 

00 

1011 

01 

1111 

Reject 
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This is constructed by taking groups of 4 individual configurations 
with the same probabilities, assigning one member of each set of 4 to 
the symbol pairs 00, 01, 10, 11 and discarding any spare ones. Applied 
to a random sequence this process produces a sequence of length three- 
eighths that of the original sequence. If the rejects 0000 and 1010 are 
recorded as 0, and 1100 and 1111 as 1, in an auxiliary sequence and this 
recoded as above and this repeated indefinitely, the final sequence can 
be extended to a total of 40% of the length of the original sequence. 
Table 7 shows the percentage of information retained for a general 
independent sequence by one application of the coding and by its 
repeated application. Comparison with Table 5 indicates the increase in 
information content and the comparative stability of the efficiency 
with p. 


Table 7 


p 

Single 

application 

Repeated 

application 

01 

0*34926 

0*35781 

0*2 

0*37234 

0*38933 

0*3 

0*37649 

0*39903 

0*4 

0*37571 

0*42333 

0*5 

0*37500 

0*40000 


If the digits are taken in larger groups and a similar process applied, 
the efficiency does not increase beyond the values of Table 7. A formal 
proof that the coding of Table 6 is optimum of its class has not yet been 
found but exhaustive trials for groups up to 10, the practical limit, show 
that it is true for that range. 

The problem of determining a coding process which retains all the 
information for any value of p is likely to prove difficult since any coding 
process to convert such a sequence for a known p to a random sequence 
involves the consideration of the whole sequence as a unit. 

5.7 The General Electric Electronic Probability Generator 

This is a machine that has been built for studying non-stationary 
random walks. This machine does not use the principle of accumulating 
a large number of pulses in a counter, but each pulse individually is 
used to give an output digit. It has been built so that the probability 
of an output digit being one takes the general value /?, which is under 
manual control. A second terminal is given whose output is always 
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complementary to the main output. It can be likened to an electronic 
biased penny-tosser. 

Figure 8 shows a diagram of this machine. 



An oscillator drives a series of circuits which produce a regular 
square wave output in which the proportion p of time that the voltage 
is high is determined by a manually controllable input voltage. This 
wave form and its complement are used to control two three-way ‘and’ 
gates G 0 and G v A second common input to these ‘and’ gates is from 
a ‘lock-out’ flip-flop. When this is set on, the gates are opened to 
receive pulses from a pulse generator of standard pattern. The first 
pulse to arrive passes through one, and only one, of the two gates, which 
gate depending on when the pulse arrives in relation to the square wave 
input. After passing through a gate the pulse is regenerated and forms 
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the output on one of the output lines. Thus the probability of it passing 
the upper gate is determined by the manually set input voltage. In either 

lodTn h t fl eg fl nerat ff d P UlS t ‘ S alS ° r ° Uted V ' a “ ‘ or ’ gate t0 switch the 
lock-out flip-flop off thereby preventing more than one pulse appearing 

from the system. The next digit is then read by setting the lock-o J 

flip-flop again. This can be done either by a single shot mechanism 

from an external source of any kind, or from a built-in multi-vibrator. 

brom figures published concerning this machine it is clear that the 

output of this generator is not of high quality, but it does have a very 

fast production rate and is no doubt suitable for a wide variety of 

sampling experiments in which the quality of the random digits is not 
very important. 

The principle of operation is an adaptation of the rotary disc method 
using two unequal sectors on the disc. The known inferiority of this 
technique (at least theoretically) may explain the disappointing per- 
formance of this rather elegant machine. 



The machine can be further extended to produce dependent , 
sequences by means of an additional circuit shown in Figure 9 in 
which the output digits are used to set a further binary counter which 
in turn, controls two gates that determine which of two possible input 
voltages are applied to the square wave generator. Thus by setting 
different values for these two voltages, a first order Markov process 
can be generated. 

Although this has not been done in the General Electric machine, it 
is clear that the process can be generalised and multiple order Markov 
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chains developed. Figure 10 shows how this is done, for a third order 

C The conditional probability flip-flop is replaced by a register of such 
devices, and after each pulse has been obtained on an output terminal 
the contents of the register are shifted one place, recording in the last 
vacated place the last digit. A diode matrix based on the digits of this 
register controls 2" gates which admit one of 2" manually set voltages 



Figure 10 


to the square wave generator. Thus, by assigning the right values to 
these 2" voltages any n-dimensional Markov chain can be developed. 

5.8 The Lion Random Selector 

J. R. and K. S. Lion describe a generator using thyratrons (gas-filled 
triode valves) as the output mechanism, illustrated in Figure 1L 

Such a valve will remain non-conducting until the grid is raised to a 
critical voltage when it ‘fires’ and will remain conducting. Sinusoidal 
voltages exactly out of phase are applied to the grids of two thyratrons 
with a base voltage that ensures that neither thyratron will fire. If the 
base voltage is now raised, one grid will be raised at that instant above 
the critical voltage but the other will not. This causes one thyratron to 
fire and a feedback current is provided, which now ensures that the other 
valve cannot fire. Which thyratron fires is determined by the phase of 
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the sinusoid at the moment of raising the base voltage. This is achieved 
by closing a switch S' by a thermal delay and the device depends on the 
inherent variability of such thermal devices. By choosing a sufficiently 
high frequency for the oscillation, the variation in the thermal delay 
changes the grid voltage at a random point in a cycle. 



Thus this machine operates on the roulette wheel principle. 

The Lion machine uses a 60-cycle oscillation (the American standard 
mains frequency) and the spread of the thermal delay corresponds to 
about 3 cycles. Increasing the frequency of oscillation could improve 
the performance. 

Records of individual digit frequencies and of pairs and triplets on a 
sample of 339 digits gave no indication of departure from randomness. 

5.9 The R.C.A. Random Number Generator 

This generator utilises the principle of the parmetron, a device used 
in computing machines developed by the Japanese electrical firm 
Kanematsu and independently suggested by J. Von Neumann. 

The principle of this device depends on the possibility of setting up 
and maintaining a stable oscillation in an element at a frequency //2 
when this is connected to a reactor driven at a frequency/. Two stable 
configurations are possible exactly out of phase with each other. Thus 
such an element can be used as a dynamic bistable device. 

The advantage of this rather complicated method of achieving two 
A.S.— 3* 
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stable states is the high frequencies that can be used. Several thousand 
megacycle devices are easily achieved, and very fast computers are in 
development using this element. 

In a computer, the choice of the stable configuration is determined by 
logical circuits. In the R.C.A. random number generator, the element is 
set in oscillation from a static state by switching on the main frequency. 
The phase of the subharmonic oscillation now depends on electrical 
noise present at the moment of switching. 

Thus this is another example of an electronic roulette wheel. 

In practice, two 2,000 M/c oscillators are used and the phases of the 
subharmonic oscillations are compared. If these are equal, a digit one 
is recorded, otherwise a nought is recorded. At these incredible speeds 
a sampling rate of about 3 x 10 7 per second is achieved. 

Such high rates of sampling introduce difficulties of testing the out¬ 
put for randomness. The machine described has been tested by counting 
the number of ones in 228 samples of 1,000 and the resulting distribution 
checked by a % 2 test against the theoretical binomial distribution. 

5.10 Analogue Randomisers 

Random number generators are most suitable for digital applications 
of the sampling technique. In recent years a large range of analogue 
computers based on high gain amplifiers with various forms of feed¬ 
back which give rise to the various linear operators such as adders, 
differentiators, integrators, etc., have been developed, for the study of 
analogue processes by machinery. If such devices are assembled to 
study the effect of random influences on a physical system, a varying 
source of voltage with a given probability distribution is required. 

As far as is known, no machines have been built to provide this 
facility. 

However, various mechanical analogue machines have been built 
and at least one mechanical analogue randomiser exists. This is the 
stochastic analogue machine devised by Stafford Beer. The purpose of 
this machine was to study the behaviour of queues, and to do this, an 
analogue quantity to represent the process time was required. As one 
of the main purposes of this machine was to demonstrate the effects of 
stochastic variation on industrial processes to management, it was felt 
that the analogy used should be as close as possible and the analogue 
quantity that was used was time itself. 

Thus the problem was reduced to the construction of time intervals, 
which followed a given probability distribution. This machine used a 
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set of pockets. Thus the probability *of L "'h connected to another 
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in Figure 12. The whole machine is illustrated 

A battery of ten of these machines has been built and are used t„ * a 
complex queueing situations. “ used to stud y 
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in having a reproducible sequence of digits that nevertheless display 
properties similar to random sequences. 

Another possible application of such pseudo-random number genera¬ 
tors is in demonstration equipment that is designed to illustrate the 
effects of random behaviour but at the same time is required to be 
comparatively cheap. 

The problem consists of generating a very long cycle of digits. A 
possible way of achieving a binary sequence of digits that has been 
used m a queue demonstrator has been constructed at the United Steel 
Companies. 

This machine simulates four process machines, each with an adjust¬ 
able mean service time, which is served by a common stock of material 

(a queue) to which items are added at varying intervals of adjustable 
mean length. 

To simplify the machine, each process distribution and the inter- 
arrival distribution are assumed to be exponential and are defined 
completely by their mean values. 

The machine advances in discrete steps of time marked by a train of 
equally spaced pulses. At each pulse, the four machines and the arrival 
point are scanned and a decision made if an event is to occur. This is 
achieved by inspecting a random binary digit; if this is a one the event 
occurs, otherwise no event occurs. For the machines, the event is that 
the present item, if any, being processed is finished; for the arrival point, 
the event is an arrival. 

The assumption of an exponential distribution enables the sequence 
of digits to be a random one, since this will ensure that the distribution 
of intervals between events will be a geometric one. This is the discrete 
analogue of an exponential distribution. 

To allow for the varying mean values associated with the different 
machines, the probability of a one for any given digit must be adjust¬ 
able. This complication will be ignored in the initial description of the 
pseudo-random number generator. 

The machine is an electro-mechanical one, and the randomising unit 
is composed of a set of uniselectors. These are electrically operated 
multi-way multi-pole switches. A bank of 200 contacts, arranged in 
8 banks of 25 contacts, are scanned by 8 wipers. On energising the device 
all the wipers move from one row of 8 contacts to the next, returning 
to the first after scanning the last row. 

If 2r of the contacts randomly selected are earthed and the 200 
contacts scanned in some order, then earth will be found on r % of scans. 
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If the scan is repeated in the same order, a cyclic pattern of length 200 
will develop, but if a large number of different scanning orders can be 
generated, the cycle length will be much increased. For convenience all 
the scanning orders will proceed from consecutive rows and only the 
choice of bank is available to generate different orders. 

This is achieved with two further uniselectors. The eight wipers of 
the first uniselector are wired on to the contacts of the first 23 rows 
of the second uniselector in a random pattern, so that each occurs 23 
times. The eight outlets of this uniselector in turn are each connected 
to three rows of one bank of the third uniselector in a random order. 

To obtain the next digit in the sequence, uniselectors 1 and 2 are 
stepped and uniselector 3 is stepped only if the last digit is a one. 
Uniselectors 2 and 3 skip over their blank levels. 

The operation of this mechanism has defied all attempts at theoretical 
analysis of its behaviour, but has been found in practice to generate a 
sequence of digits that pass all the classical tests for randomness. 

An indication of the length of cycle obtainable can be obtained as 
follows. 

If the contacts of the first uniselector are numbered in any conven¬ 
tional way, the number of the contact connected to the output is defined 
by the levels r, s, t of the three uniselectors and so the presence or 
absence of earth can be represented by a function w(r, 5, t\ taking values 
0 or 1 according to the earth condition of the connected contact. 

Now denoting wth values by a sub-script, we have 

r fI+1 = (r B +1) mod 25 
s n+ i = (s n +1) mod 23 
t n+ 1 = (?„ + «„) mod 24 

Assuming r, s, t start at zero, the cycle length is given by the least n 
for which 

r n = s„ = t n = 0 
i.e. n = 0 mod 25 

n = 0 mod 23 i.e. n = 0 mod 575 

n— 1 

U„ = Y, U r = ° m0d 24 

r = 0 

Now assume the mechanism is perfect; then lLu r is a binomial variable 
with probability p and index n. 
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The distribution U„ will be approximately uniform and thus the 
probability that the cycle is of length 575m is 

p(m) =J_(A3y«-i 

and the expected cycle length is 24 x 575 = 13,800. 

As this crude theory predicts, the cycle length does depend on the 
starting position. The theory, of course, breaks down as the distribution 
of U n cannot be of the form assumed since there is a limited amount of 
variability in the machine represented by the initial wiring. 

In practice, the contacts on uniselector 1 are divided into 20 groups 
of 10 randomly disposed, and r of these are earthed if the probability 
of a one is required as 5r% Switches associated with each machine and 
with the arrival point are used to determine the groups used for each 
type of event and are chosen differently for each type. 

Thus the stepping of uniselector 3 depends on the last use made of 
the mechanism. This further complication seems to lengthen the cycle 
although the crude analysis alone would not predict this. 



CHAPTER 6 


PSEUDO-RANDOM NUMBERS 

As the complexity of the sampling experiments performed increases, 
the possibility of computing errors increases. To repeat the calculations 
will require the random numbers used in the initial calculation to be 
available for the check calculation. Using tables of numbers and manual 
methods of calculation, this is possible, but tedious. 

If automatic calculation is used, the storage of the large volume of 
random numbers used becomes a serious problem. It is this storage 
problem that has led to the abandonment of the use of tables of ran¬ 
dom numbers in computers. The use of the physical generators des¬ 
cribed in the last chapter will not eliminate this storage problem if 
check calculations are required. 

We shall see later that the ability to repeat the sequence of the random 
numbers has valuable advantages in increasing the accuracy of sampling 
experiments. 

Thus the need arises for a process that will generate a ‘random 
number sequence’ which will be reproducible. This is clearly a complete 
contradiction of the meaning of a random sequence. The reproduci¬ 
bility of the sequence implies the possibility of prediction of the sequence 
and constitutes the proof of its non-randomness. 

However, the advantages of reproducibility demand an investigation 
if it is possible to obtain approximately random sequences by a deter¬ 
ministic process. In principle it should be possible to define certain 
criteria required of a sequence that is to be used as a random sequence 
and then construct a process that will satisfy the criteria. Such a 
sequence is called pseudo-random. 

An obvious criterion required is that each element of the sequence is 
bounded and that all possible values within the bounds will appear 
equally often in the sequence. This is clearly insufficient since the 
sequence 

1, 2, 3, ..., n 1, 2, 3, ..n 1, 2, ... t n 
satisfies this requirement and yet is clearly not random. 
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A second criterion required might be that the serial correlation of 
consecutive elements in the sequence is zero. However, the sequence 

1.2,3, ...,2n 1, 2b, 2, 2b-1, 3, 2b -2, ..., n 

» +1.1.2, 3, ...,2b, ... 

will have this property for large n and yet it is clearly not random. 

e may extend the requirements by demanding higher and higher 
serial correlations to be zero or more strongly that longer and longer 
sequences are equally likely. longer 

Again, this will not prevent an obviously non-random sequence being 
accepted as pseudo-random. The sequence 4 8 

1, 2, 2, 3, 3, 1, 3, 2, 1, 1, 2, 2, 3, 3, 2, 3, 2, 1, ... 

contains each of the 9 possible ordered pairs ( 1 , 1 ) n 2 ) (1 T> (7 n 

(2 > 2 ); ( 2 > 3 )’ ( 3 . !). (3. 2), (3, 3) equally often but is clearly cyclic.’ 

All deterministic rules for forming a sequence of bounded numbers 

tong cycle ^ ^ eSSentiaI probIem is t0 P roduc e a sufficiently 

The first attempts to generate pseudo-random sequences were based 
on arbitrary rules and the properties of the resulting sequences could 
not be predicted theoretically. The value of a rule was determined by 

obtained ^ Standafd teStS for random numbers to the sequences 
6.1 The Mid-square Technique 

This was first introduced by Von Neumann. A p-digit number a 
is squared and from the resulting 2 p digits the mid digits are taken as x° 

take^^^and ^ ^ SqUared and the process repeated. As an example 


*o = 76 then x 2 0 = 5776 Xl = 77 

The sequence is 


76, 77, 92, 46, 11, 12, 14, 19, 36, 29, ... 

If the radix used is r then there are r 2 ’ possible values of a and conse¬ 
quently the sequence must ultimately repeat some previous value 
From that point the sequence is repeated—i.e. it is cyclic. In practice, 
ffie cycle ength is considerably less than the theoretical maximum r 2 “ 
The length of the cycle is dependent on the starting value x 0 . Certain 
values can lead to a zero term when the cycle length becomes one 
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These objections are sufficient to condemn this method, but a more 
serious objection is that the sequences obtained do not satisfy the 
primary requirement of a random sequence that any value in the per- 
mitted range is equally likely to occur. 

This can be shown as follows. 

The truncation of the digits from the back of (in forming ^»+i) 
is merely to limit the number of digits that are handled and in a theoreti¬ 
cal analysis we merely consider the front truncation. 

Then 

x n+1 = r p (x 2 n modr p ) 

Put x\ = u r~ p = h m = r p = ^ x„ +1 = cd 

Then mh = 1 

If u and v have p.d.f.’s p(u) and Pl (v) and c.d.f.’s P(u) and P±(v) respec¬ 
tively, then 

Pl (w) = h[p(co) + p(a> + h) + p(co + 2 h )... p(oj + m -1 hj] 

~ p(a> + x) dx = P(<Q+l — h) — P(cQ) (1) 

Jo 

Now if x n has a uniform distribution, we have 
P(u) = iu~* , P(u) = 'Ju 

P x {u ) = u + V n(V u + 0(w)) + 0(hu) 

Thus x n+1 has a skew distribution favouring low values of x n + 1 . 

This bias in * n+1 will create a greater bias in x n+2 and will cause a 
continuous degradation of the sequence. 

6.2 The Mid-product Method 

In an attempt to remedy this defect, an extension of the mid-square 
method has been proposed. Two starting values x u x 2 are given and 
their product u is formed. The mid digits of u are used as x 3 . The process 
is now repeated on x 2 , x 3 to form x 4 and so on. 

The distribution of u (assuming a uniform distribution on (0, 1) for 

u r and u 2 ) is 


p{u) = -logw 
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Applying formula (1) we obtain 

p ! (u) = u — hu log u + 0(hu) 

This process has less bias than the mid-square method and due to 
the need for a pair of terms to repeat for a cycle to restart usually has 
a larger cycle than that method. 

However, these two processes do not now receive much favour 
because of the known bias. Various other transformations followed by 
truncation could be adopted, but all will lead to bias. Within the 
approximation of (1) we require to determine a transformation /(*„) 
of x n so that x n+1 given by 

« =/(x„) 

x n+1 = r p (u mod r~ p ) 

will have a uniform distribution. 

Within the limits of the approximation (1) this implies 

P(oj +1 — /z) — P(oj) — A 

for a suitable range of oj and some constant A. 

Over this range P(w) = a + for suitable a and and hence 

p(u) = p 

Thus the distribution of u is uniform and so the only permissible trans¬ 
formation is 

f(x) — kxA-l (k, l constants) 

The case / = 0 has been suggested for completely different reasons by 
Lehmer. 

6.3 The Lehmer Congruence Method 

Once it is recognised that a pseudo-random sequence will be cyclic, 
it is possible to require to generate a cycle of maximum possible length. 
In 1949, Lehmer made the suggestion to use the theory of numbers to 
devise such long cyclic sequences. 

The theory of congruences supplies such sequences. Consider an 
integer sequence suggested by the heuristic argument of the last section 

x n + 1 ~ kx n mod m 

where initially m and k are arbitrary integers. 

Then x n = k n x 0 mod m 
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For the cycle length we require to find the minimum n, for which 

k n = 1 (mod m) (2) 

(assuming (v 0 , m) — 1) 

Now the Fermat-Euler Theorem states that if ( k , tri) — 1 then 


= 1 (mod m) 


where cj)(m) is the number of integers less than and prime to m. If m is 
prime, cj)(m ) = m-l and then it is apparently possible to achieve a 
sequence of length m — 1. 

However, the theorem does not state that 4>(m) is the smallest integer 
for which (2) is true. It is clear that the smallest integer n with the 
property (2) is a divisor of for otherwise we can write 


Then 


(j)(m) = an + b , 0 < b < n 
= k an k b = k b = mod m 


i.e. b satisfies (2) 

in contradiction of the definition of n as the least value with property (2). 

It remains to choose k and m to make n as large as possible. If the 
maximum length cycle can be achieved the starting value v 0 will be 
immaterial, and if n cannot be made equal to m — 1 it is still desirable to 
make the cycle length independent of x 0 . 

However, there is a prior consideration in the choice of m. The opera¬ 
tion of taking a congruence involves a division and it is desirable to 
choose m to make this usually lengthy operation rapid. 

For a number in radix scale r choose an integer p and write 

x = yr p +z = y(r p -l)+y+z 
Then x (mod r p ) = z 

x (mod (r p — 1)) = y+z 

Thus if m takes either of the special values r p , r p -1 the division 
merely involves the separation of the digits of the number into two parts 
and at most the addition of the parts. Also since 

* = y(r p +l)+z~y = (j-l)(/' p +l) + (r p -.y)+z+l 
* (mod (r p +1)) = z—y z ^ y 

= r Pj r 1 +z—y z < y 
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Thus r p +1 is also a suitable divisor to eliminate any real work in 
division. 

If m — p a iP a 2 2 ... p° k 

where Pl ,p 2 ... are primes and a L , a 2 ... are positive integers, then 

With most computers working in the binary scale, r = 2 and we are 
0(2 ' , ~ 1) and ^ (2,,+1) ' From (3) k foiiows that 

It is possible to show that in fact for m = 2 P the maximum cycle is 

of length 2 P and the only value of k giving this cycle must satisfy the 
conditions 

k = ±3 mod 8 
x 0 odd 

A suitable choice of It is an odd power of 5 since 

5 2 « + 1 = (l + 4) 2 « + 1 = (1 +(2^ + 1).4) mod 8 = -3 mod 8 

In this case, p should be chosen as large as possible and hence equal to 
the number of digits in the number stored. After multiplication by k 
the least significant p digits of the double length product are used as 

*»+i- 

Values for this scheme which have been quoted in the literature are 

fc = 5 13 p = 36,39 
k = 5 17 p = 40, 42,43 

For decimal machines we are concerned with 

0( 10 p ) = 4>(5 p )(j>(2 p ) = 2 p+i 5 P_I 

There is little in the literature concerning the theory for this scheme 
but the following values have been used 

k = 7 p = 10 

= 7 4o+1 p=ll 

= 3 19 p — 20 

The case m = 2 P - 1 is more complex. The longest cycle will be given 
when p is chosen so that 2 P — 1 is a prime when the cycle length is 2 P — 2. 
Now 2" -1 is prime only if p is prime since 

(x p “- 1) = ( x p - \){x*«- + x p(q ” 2) ... +1} 
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The numbers 2 p -l for prime p are known as Mersenne numbers and 
it has been established that the only prime Mersenne numbers are 
given by 

77 = 2,3,5,7, 13, 17, 19,31,61 ... 

Thus for this case longer cycles may be achieved by using a value of p 
less than the word length. The practical values of p used are 19 and 31 
giving cycle lengths of 524,286 and 147, 483, 646 respectively. 

In the case of wi — 2 P — 1, the choice of a suitable value for k is not 
amenable to a complete analytic solution. A process of trial and error 
for small k is usually used. Any power of such a k can also be used 
providing it is prime to the period m — 1; also a value congruent mod tn 
to such a power will also give the maximum period. 

The commonest combination is that given by p = 31, m — 2 —1, 

k = 13 13 =s 455,470,314 mod (2 31 -1). 

x n+1 = 455,470,314*,, mod (2 31 — 1) 

Any value in the range 1 ^ * 0 ^ 2,147,483,646 may be used for 
starting the sequence. 

This sequence is usually the best for fixed-point binary machines. 

The case m = 2 P + 1 is not usually so satisfactory. This is only prime 
if p = 2 q for some integer q. The numbers 2 iq +1 are known as Fermat’s 
numbers and the first four, 5, 17, 257, 65,537, are the only known prime 
values in the sequence in spite of investigation at least as far as q = 73. 

Thus composite m must be used and the one useful case is given by 
p = 29. 

m = 2 29 +1 <j>(m) = 3, 033, 168 

and k = l n - 366,714,004 

x n + 1 = 366,714,004*,, mod (2 29 + l) 

This is useful in a floating machine (such as the Ferranti Mercury) 
where the mantissa is restricted to about 30 binary digits. 

An alternative sequence sometimes used is 
p = 35 

m = 2 35 + l = 1, 034, 040 

/c = 23 

x n+1 = 23x„ mod (2 35 + l) 

For decimal machines, values recommended are 

p = 8 m = 10 s +1 <Km) = 5, 882, 352 
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The heuristic argument of the last section suggests that the most 
general transformation that would give a usable cycle is 

x n + i — (kx n + /) (mod m) 

lU f is P° ssib,e for tbe sequence to include the value zero, and a 
cycle of length m is possible. 

If m = 2 P a cycle of length 2'" 1 can be produced by the recursions 


x n+i ~ (4k-hi )x n 4 - k (mod 2 P ) 
anc ^ x n + 1 = (4k -h 1 )x n + 3 k (mod 2 P ) 

provided k is odd. 

These schemes all have the property that each of the permitted 
numbers appears once and once only in each cycle. This however does 
not indicate how the values are distributed in the cycle. The choice of a 
large value for k is to prevent a small value of x n being followed by the 
small values kx„, k x„,... A large value of k causes kx„ to exceed m 
and so x n + i is not necessarily small. 

Some indication of the distribution of the values in the sequence is 
given by the serial correlation of the sequence. If the parameters k and / 
are chosen so that the cycle is of full length m, it is possible to estimate 

thic 


. f "'” 1 f / m -1 

P( x m *« + l) = < X (Vn + l) — — ( X X n 
(n=o m\^ n ~Q 


Now 


m-l m-1 

X x n = X r ~ im(m-i) 

n — 0 r = 0 


z 


n = 0 



m 


I 




m-1 m-1 

X X n = X — l)(2m — 1) 

n = 0 r = 0 


since the sequence {x r } is merely a rearrangement of the sequence 

0 , l, ..m -l. 

m-1 

it remains to evaluate £ 

kx n -hi — q n m + r n 
0 <= r n < m 

X n+ 1 s=: f'n 

^ x n x n +1 = 2 x n (kx n -h l — q n m) — klx% + llx n - mLx n q n 
Thus the problem reduces to evaluating I.x n q n . 


Put 

where 

Then we have 
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Now if q n = s then satisfies the inequality. 

0 < kx + l — sm < m 


sm-l < < sm-J + m 


i.e, there are approximately ^(=psay) consecutive values of * for 

each , The possible values of i are 0, 1, ..The highest value k 
only occurs once at the end of the cycle. Neglecting this we have 

k -1 ii -1 9 k ' 1 2 jU(jU—1)^0 

IX, - s s I (*!« + *) = s ? 0 s l s 

From this an approximation to p can be found, neglecting terms in - as 


p(Xj,, +1) 


1 _a_ 

k km 


This is only valid for small L ., , 

A more accurate formula involves the least positive residues r q of 

the numbers l—qm. 

k 

i P t 


Then if l < k 


, 1,12 (S k 

P(x,,x. + 1 )-- + Hp 4 


, 1 

p(x„,x, l+i ) k mk \^ m j m y k 2 4 J 

Greenberger, who is responsible for the later formula has developed 
techdq™. for toning »» »*“ '““f” ’“J 

r,fh™n«to tongiii U no g»to= 4.. A. 

will be suitable as a generator. 

Another example quoted by Greenberger is more satisfactory. Again 
m = 2 35 k = 2 18 + 1 /= 1 P< 2-18 
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The analysis will also give higher serial correlations by noting that 
^n+m is given by 

x n+m = + mod m 

(k-1) J 

If this generator also gives a complete cycle, then the formula can be 

applied with k replaced by k m and / by — .3 

k — 1 

This interesting development has not yet been extended to cases in 
which a full cycle is not generated. 

However, even if the correlational properties of the sequences are 
satisfactory, the main objection to them for sampling purposes is that 
once a value has occurred in the sequence, it cannot occur again in the 
present sampling experiment. This denies the principle of independence 
of the items in the sample drawn. 

In practice, this is readily overcome by truncating each number in 
the sequence, removing, say, the last 7 digits. Now a truncated value 
can occur exactly 128 times in a cycle and the number of times it will 
appear in a short stretch of the cycle will vary. The appearance of 
independence has been restored. 

6.4 Second-order Recurrence Processes 

The mid-product method produces a larger cycle than the mid square 
in general, and this raises the question if a second order process 

^n + l =/(^,^_t)mod /?2 

will produce a longer sequence. The simplest possibility is 

fipc m X n _ j) X n -j- X n _ i 

x n+L = fc + x^Ofmodm) 

Any sequence defined by 

X n+ 1 ~ Aj + Aj-I 

is known as a Fibonacci sequence and has many interesting properties. 
The most well-known sequence is given by v 0 = = 1 and is given by 

the formula 
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The general sequence is 

= u n . 1 x 0 + u n x l 

The cycle length is given by the least n satisfying 

(m„_ 1 -1)x 0 + m m x 1 = 0 mod m 
u n x 0 + (u n+l -l)x 0 s 0 mod m 

which reduces to 

«?-(«„+1 - !)(«»-1 - 1 ) s 0 mod m 

since at least one of x 2 , is non-zero. 

Now it can be shown that 

Mi5-U n +lU„-l =(-l)" _1 

and the condition reduces to 

v„ = 2h„_, + u„ = (-l)" -1 -1 (mod m) 

Now for odd n 

2 m -\ = 2[(1 + V5r + (1-V5) m ] - l+ m C 2 5 + "'C 4 5 2 ... 

= 1 mod m 


since m C r = 0 mod m 

By the Euler-Fermat theorem, if p is prime, 2 P ~ 1 = 1 modp and thus 
for prime p the condition becomes 


v p = 1 mod p 


For m an odd prime, the maximum cycle of length m cannot be 
achieved. The complete theory involves the theory of primitive roots of 
congruence equations which is beyond the scope of this book. 

The additive congruence method has not found much favour. 
However, Page has pointed out that since 



*o + 



l+A y 1 


x 0 + 


VL vi-Vsv- 1 


x n c* Ax n 




x n behaves like the solution of * n+1 = Ax„, i.e. asymptotically the 
method behaves like the multiplicative congruence method with an 
irrational choice of constant multiplier. 
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If the pseudo-random numbers are used as a source of random digits 
rather than as approximations to uniform random variables some care 
is necessary. If congruences are taken with respect to 2 m , clearly defined 
patterns are created in the trailing digits. 

For the multiplication congruence method 

x n+l - kx n rnod2 m 

y »+1 ~ x n+t mod 8 = Iy n mod 8 where l ~ k mod 8 

For a long cycle / - ± 3 * 0 s 1 mod 2 

l 2 s= 1 


Thus the last three binary digits can only take two values, y 0 and 
3^o (mod 8) or y 0 and 5>> 0 (mod 8), i.e. two digits have fixed values and 
the other alternates from term to term. 

By taking congruence to 2" a more general result can be obtained. 
Put k' — k mod 2 P . Let a be least positive solution of 


k' a = 1 mod 2 T , y n = x „ mod 2" , y n + 1 = k'y n mod 2" 

The only values of y„ are y 0 , k'y 0 , k' 2 y 0 , . ..,k’“y a . Thus the last p 
digits are cyclic of order 2 P ~ 1 . 

Similarly with the additive congruence method if y n = x „ mod 2 P 


y» +1 = (y n +y»-i) mod 2 P 

and T.+J will be cyclic of period at most 2". We have shown that for 
odd prime p the period will be less than 2 P . 

If congruences are taken to an odd modulus this phenomenon largely 
disappears as the congruences to the composite modulus 2 p m will give 
a more complex structure. 

However, the last digit in the multiplicative process will be alternately 
odd and even if k is odd and in the additive process it is cyclic of period 
three (even, even, odd). 

. If ( rand ? m di « its are required from such pseudo-random numbers a 
purifying process such as the rejection technique described in the 
chapter on physical random generation is advised. 


6.5 Tests on Pseudo-random Numbers 

In spite of the doubts expressed in many quarters of the validity of 
the . S . e u u > nbers as a source of randomness, there are surprisingly few 
published accounts of the results of applying the standard tests to 
numbers generated in this way. 
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The first of these, by Johnson, concerned the multiplicative con¬ 
gruence method with 

& = 23 m = 2 35 +1 * 0 = 10, 987, 054, 321 

A sample of 112,000 35-bit numbers were paired and the resulting 
70-bit numbers divided into 7 10-bit numbers. The resulting 392,000 
numbers were divided into 28 blocks of 14,000 numbers and each block 

subjected to three tests: . 

(i) the distribution of 1,024 distinct numbers against the theoretical 

uniform distribution; 

(ii) the number of l’s in the 140,000 digits; 

(iii) the poker test. 

The resulting % 2 for each block were then tested by a % * or 
agreement with the theoretical y 2 distribution. On all three tests, the 
sample passed this composite test. 

Thus the evidence of this experiment is that this particular sample 
behaved under test similarly to real random numbers. 

A similar investigation has been made for the multiplicative con¬ 
gruence method k = 13 13 , m = 2 31 -1. No details have been published 
but the conclusion was the same as in the preceding case. 

The third account, by Davis and Rabinowitz, concerned the use of 
random numbers to pick a random point in an n-dimensional cube. 

The test used was that the correct proportion of such points lay inside 

the inscribed ^-dimensional sphere. 

Three methods of generating pseudo-random numbers were used: 

(i) the multiplicative method, 

k = 517 m = 2 42 x 0 = 1 

(ii) the additive method, m = 2 44 , x 0 = 0, = 1, selecting every 

other value generated; 

(iii) the same method, skipping 1, 2 and 3 terms depending upon 

whether the last hexi-decimal digit of the last number selected 
was 1, 2 or 3 (mod 3). ^ 

Starting values x 0 = [2 42 .ti] *1 = 5 

The conclusions of this test are that methods (i) and (iii) are satis¬ 
factory, but not method (ii). 



CHAPTER 7 


ELEMENTARY SAMPLING EXPERIMENTS 

The original sampling experiments, conducted by the early stati¬ 
sticians such as Student’, Galton, Karl Pearson and others of his school, 
were concerned with determining the form of the distribution of 
statistics. The object of these studies was to determine the amount 
and form of variability that some estimate of a parameter of a j^asic 
distribution would have in repeated sampling. With this knowledge, 
it was possible to make statements concerning the likeliness that a 
particular sample obtained from some experiment could have arisen 
from some hypothetical distribution. 

As an illustration of this process, consider an estimate for the mean 
of a normal distribution, given the sample of the n values of such a 
distribution. What estimate of the mean should be used, and how will 
it be distributed in repeated samples of n ? In fact, as is well known, the 
average of the sample values is the best possible estimate of the mean 
and its distribution can be shown by analytic methods to be normal. 
If the standard deviation of the original distribution is a then that of 

the mean is —j=- m 
sn 

Now suppose the particular sample of say 16 values from such a 
normal distribution has an average of 1-29 and it is known that the 
standard deviation is 2-0. The standard deviation distribution of the 
mean of repeated samples of 16 values will be 0-5. If we require to test 
if the original distribution has a zero mean, we ask if it is likely that 
such an extreme value as 1-29 could arise from a normal distribution 
of mean zero and standard deviation 0-5. Consulting tables of the normal 
probability interval we find that 

Pr(|*| S 129) = 0-01 ( 1 ) 

i.e. only once in a hundred samples will a value as extreme as this from 
the expected occur. 

The assumption of a zero mean is unlikely. It is not, of course, dis- 
proven, but in any situation in which indefinitely large departures from 
the expected values can occur, no hypothesis will be proved or dis- 
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proved, but will only be shown to be likely or unlikely on the evidence 
of the sample given. 

It was arguments of this kind that led to the modern theory of signifi¬ 
cance testing, then the Neyman-Pearson theory of hypothesis testing, 
and in modern times to statistical decision function theory. 

The simple example above made an assumption unlikely to be realised 
in practice: a question is asked about the mean of a distribution when 
its standard deviation, the measure of spread, is assumed known. 
What if the spread is also unknown? This will also have to be estimated 
from the data. A natural estimate for the expected mean square devia¬ 
tion will be a sample mean of the squares of the deviations from the 
sample mean. In symbols, if the sample values are 

xy, Xii ^ x n 

then 


s 2 = - t (x/-x) 2 
n i= i 

In practice, it is easier to deal with ns 2 rather than s. Tt is possible to 
obtain the theoretical distribution of the former quantity by mathe¬ 
matical analysis. This is the well-known x 2 distribution and its degrees 
of freedom are n — 1. Thus we have 

E(ns 2 ) = (n- 1)<7 2 

and so an unbiased estimate of <r 2 is given by ns 2 /(n - 1). 

Now if a given sample of 16 values from a normal distribution of 
unknown mean and standard deviation has a mean of value 1*29 an 
estimate of a is required before the probability statement (1) can 
be made, but this estimate is liable to error. The result of substituting 
a wrong value of a will lead to a false value of the probability and to a 
false conclusion. How is this dilemma to be resolved? 

In 1908, ‘Student’ recognised that the correct approach to this prob¬ 
lem is to form a statistic 

* = ^ ( 2 ) 

V ns 

and to determine its sampling distribution. He actually derived the 
analytic expression for this distribution and this achievement is often 
regarded as the foundation step of modern mathematical statistics. 
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Whether, from doubt of the validity of this argument, or for the pur¬ 
poses of convincing the readers of his result, he also established the 
distribution by a sampling experiment. Since the number of samples 
that can be drawn will be limited by time and energy, the resulting 
theoretical distribution cannot be reproduced exactly. In practice, a 
histogram is found, and the shape of this histogram is compared with 
the theoretical distribution. Suppose we are concerned with a sample 
size of 16, then samples of 16 observations from a normal distribution 
of mean zero and standard deviation 1 are drawn. From these the 
statistic t given by (2) is calculated. This is repeated, say, 500 times and 
the resulting 500 values of t are formed into a histogram. The choice 
of the number of class intervals and their width can be chosen for the 
histogram by inspection of the resulting values of t. The results of such 
an experiment are given in Table 8. 


Table 8 


t 

/ 

0 *0- 

37 

0 *2- 

36 

0*4- 

40 

0 *6- 

24 

0 *8- 

28 

1 *0- 

25 

1 *2- 

12 

1*4- 

19 

1 *6- 

10 

1 *8- 

7 

2 *0- 

9 

2 *2- 

5 

2*4- 

3 

2 *6- 

1 

2 *8- 

2 


' / 


-0*2 

39 

-0*4 

41 

-0*6 

32 

-0*8 

39 

-1*0 

19 

-1*2 

17 

-14 

14 

—1-6 

8 

-1*8 

11 

—2*0 

3 

-2*2 

8 

-2*4 

3 

-2*6 

5 

-2*8 

3 


If the resulting histogram appears to be too ragged, then the choice lies 
between choosing wider class intervals, thereby sacrificing definition 
of the distribution or taking further samples. It is very desirable to have 
some objective method of obtaining an estimate of the sample size 
required before the sampling starts. This must be chosen to give a 
prescribed accuracy to the estimates of some frequency class. Supposing 
the true probability of falling in a particular class has a value p Then 
given sample size n, the number of sample values falling in this class 
will have a binomial distribution of mean np and variance npq. If n is 
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large enough, as it will usually be in practical experiments, the distribu¬ 
tion may be approximated by a normal distribution and thus the 
probability of any arbitrary number falling in the cell can be easily deter¬ 
mined. However, it is not the distribution of the absolute number of 
elements falling in the cell that is required, but the accuracy in which 
the class frequency can be estimated. The relative error in estimating 
the class frequency is given by (x — np)jnp where v is the number appear¬ 
ing in the class. This has an approximately normal distribution with 
mean zero and standard deviation sjqjnp. Thus to find the probability 
P of maintaining a given level of accuracy a we have to determine n 
from the equation 


e~^ t2 dt 


where 


‘-VY 


This can be written in the form 


i<j>“ 1 (P) = X say 
a 


This is shown for suitable values of K in Figure 13 and illustrates that 
the smaller the class frequency we try to estimate the less accurate is 
our estimate. The sample size n should be chosen to give the required 
accuracy on the smallest cell frequency required. In practice, the situa¬ 
tion is more complex because several cell frequencies are being simul¬ 
taneously estimated and excesses in one cell over expectation must be 
balanced by deficits in other cells. 

Given a sample distribution, it is natural to test a hypothesis that this 
sample has arisen from some hypothetical distribution. In Student s 
case, for example, his theoretical distribution, given by 


VOV^T 


2\d+1/2 
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where v is the degrees of freedom, would be the natural hypothesis. 
From this distribution the cell frequencies are calculated by quadrature 
and a comparison made by a x 2 test. 

7.1 Bivariate Distributions 

In a similar way bivariate and even multivariate distributions may be 
sampled. No new principle is involved here but two or more statistics 
for each sample are calculated and the bivariate histogram developed. 

In the same way, the sample size required for any given accuracy is 
determined by a similar calculation. However, the marginal distribution 
of each of the component statistics will also be of interest and must be 
reasonably well defined. It thus follows that the individual cell fre¬ 
quencies of the bivariate distribution are very much smaller than that 
for the corresponding marginal distribution. This implies that the total 
sample size required is very much larger in this case. Roughly the total 
number of cells required in the bivariate case is the square of that for 
the univariate case. In practice the resulting distributions obtained from 
such bivariate sampling experiments are too crude to give significant 
conclusions. 

However, bivariate distributions are of great value in the more 
efficient estimation of irregular distributions. 

7.2 Improving the Accuracy of Estimation of Sampling Distributions 

In many of the situations, there are alternative statistics that may 
be used to estimate the parameter and in investigating by sampling 
experiments the distribution of one of these, it is a simple matter to 
calculate the other statistic at the same time. If both of these are 
useful estimators, we are likely, in successive samples, to obtain similar 
values from these estimates, i.e. these estimates are correlated. It is 
thus possible with very little extra work to obtain a bivariate distribution 
of two correlated statistics. 

Now suppose that the marginal distribution of one of these statistics 
is known theoretically, then there must be some evidence concealed in 
departures of the actual sample distribution from the theoretical, and 
it should be possible to correct the sample values to take account of the 
deviations and thus improve the accuracy of the estimation of the 
sampling distribution of the first statistic. This technique was developed 
by Fieller and Hartley, and is illustrated in their paper by the joint 
distribution of the range and the sample standard deviation from inde¬ 
pendent samples from the normal distribution. 
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We sub-divide the range for the two statistics x and y , say, into 
intervals and denote the number in a sample falling in the fthy-category 
and the y'th x-category by n ijt The probability associated with this cell 
is denoted by p iJt The marginal frequencies are given by 

= P-j = '£p lJ 

n i=X n u Pi- =X Pu 

3 j 

" = =I«-y 

1 j 

Also T,p t --'Lp.j = i 

’ j 

The conditional probability of a sample falling in the cell ft /) given 
that it falls in the columny isp tj /p.j, and is estimated by fiijjn.j. 

Now = and Vi . = £ PiJ = lp . Ell 

P-i J p.j 

and so an estimate of p,. is 

A = Zp-A 

j n.j 

when p.j is known. 

Unfortunately if n. } is zero, so must n tJ be, and the ratio is ill defined 
In this case the estimate 


Pu = P-r 


is used. 


The complete estimating procedure can be summed up in the equation 

Pi = Zp.jiUj 

where 


W;, = - bL 


n.j > 0 


n 


n ’j- 0 


This formula leads to a slight bias in p { since the expected value of u -. 
is biased. In fact, it can be shown that u 

= PuiP.j+v-p.jr-'ip,. -Puip.j) 








Hence 


The bias 
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Zn-p-/ 1 '(Pij-Pi-p.j) 


(Pij-Pi-Pj) 
1 


‘ j 


I 


-nP-j 


This can be made negligible if the sample size n and cell boundaries 
are chosen so that np. } ^ 8 (say). This can be achieved in advance of 
sampling since/;.,, are known. of 

T^Zt an r/ thk , f r u,a iS ’ ofcourse ’ a reduce d variance of A. 

This, by some tedious algebra, can be shown to be given by 

V(P,.) ~ p,.(l-p, (p.j-pi.p.f/np. 

j 

+ 1 Pu^-P-j)(l-p iJ /p.j)/n 2 p. J +Y d (l-p. J )"- 2 (p iy _p..p f 

The last term of the expression may be ignored as it is of the same order 
magnitude as the bias. The third term can be ignored relative to the 

fhe^ec d 86 The „ firSt item is the ordina 'T binomial variance and 
the second represents the main variance reduction. 

* and V are independent then, by definition, Pij =,p ( .p.. and the 
vanance reduction is identically zero. At the other extreme of complete ' 
epen ence w ere Pij <5 iy p. y - S ijPl . then the estimate p, has zero 
variance. In the intermediate cases, if * and y are highly correlated 
there will be a considerable reduction in the variance of pi 
We illustrate this procedure with the estimation of the distribution 
of a median of a sample of three items from a normal population, using 
the known distribution of the mean as a control variable. 8 

t ° n f 86 92 8 ‘ VeS the reSuIt for 8,9 Sam P les of 3 and 

ta a ke U n a riom tables ^ Ce,1 ' fWqUCndeS and the true valu - 



CHAPTER 8 


FLOW DIAGRAMS 

The description of sampling procedures in ordinary English is rather 
word-consuming, and it is desirable to develop an abbreviated language 
for these descriptions. Further, since for large-scale sampling experi¬ 
ments computers are almost a necessity, it is desirable to develop a 
language that is suitable for describing the problem for a programmer. 

This is achieved by the flow diagram. There are several conventions 
used for these diagrams, which describe the successive steps of the calcula¬ 
tion, and we shall adopt the system involving minimal conventions and 
as general as possible and not oriented towards any particular computer. 

The success of the automatic computer hinges on the fact that all 
extensive calculations can be described as repetitions of cycles ot 
comparatively short calculations. Thus, each order in the programme is 
used many times over. If the programme had to consist of a string of 
orders obeyed serially, the time to transcribe the orders might easily 
exceed that necessary to do the calculation by hand. Such cycles of 
orders must not be repeated indefinitely, and so the order code of all 
machines contains orders that allow tests to be made on the quantities 
produced by the calculation, and as a result of the tests, the order of 
execution of the orders may be determined. 

Thus the programme consists of blocks of orders which.effect the 
calculations required, each terminated by a test (or tests) which directs 
the computer to the appropriate next block. 

We shall use a rectangular block to represent the calculating orders 
and will write a brief description of the calculation in the block. The 
test at the end of each block will be written in an oval and branches 
from the circle will indicate the possible blocks of programme next 
executed and the conditions necessary to follow each route. 

Thus to print the 100 means, each from a sample of 5, from a normal 
distribution, the flow diagram would appear as in Figure 14 (a) on page 95. 

Usually, but not universally, the choice of routes at each branch 
point is limited to two, and in this case the route followed when the test 
named is satisfied is marked with a 0 and the route followed on failure 
with a X. 
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is immediately followed by a test. In these cases, the abbreviation ‘ct m’ 
will be used for both the count and the test. ... 

When the calculation is set out in this abbreviated notation, it is 
often easier to see possibilities of improvement. In the example, the pair 
of operations A" = 0 m' = 5 occur twice and this enables the procedure 

to be modified to: 



Figure 15 

It is not always necessary to specify each process in full in one flow 
diagram. In the example of the last chapter used to illustrate the use 
of the mean as a control variable, the determination of the median in 
terms of elementary operations is quite difficult. However, the mam flow 
diagram can dismiss this by a single statement. 



Several additional notational points are raised by this example. The 
suffix on the variable Z is an index adjusted by the programme. The 
frequencies being collected are indexed by p determined by the variables 
X Y which involves a further calculation. 

The calculation ‘T = median of Z 0 .Zi.Z 2 ’ has its own sub-flow 
diagram as shown in Figure 17 on page 97. 

The reader may find it of interest to examine the case when two or 

more Z’s are equal. 

For large sample sizes, this simple technique would involve an inter¬ 
minable number of different tests and a more systematic method is 
required. 
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Consider the case of an odd sample size 2m+l. Then there are m 
sample values greater than or equal to the median value. A method 

found b Th 0 ^ e ? h Va ! Ue f tUm Until 0,16 satisf y in S ^is condition is 
found. This has two disadvantages: if the sample contains several 

qual medial values difficulties arise; information from former trials is 
wasted m later trials. The first disadvantage is avoided if both the 
number of sample values greater than the trial one, and the number less 
than it are counted. Both these counts must be less than or equal to m 


I nals can be avoided if bounds on the median are adjusted after each 
triaJ and values outside the bounds abandoned. 

The flow diagram to describe this is as follows: 



Figure 18 
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The box 'ALARM’ illustrates another feature and advantage of this 
approach. It makes it easier to incorporate tests for errors. In this case, 
the adjustment of the count on m is automatically followed by a test 
and its success clearly means an error, since it is now not possible to 
find the median. Without the test, the error would have continued to 
search over non-existent sample values to find the median. 

To illustrate the working of this cyclic method, consider the sample 
of 9 values (m = 4) 

At 0 12345678 

Z„, 7 13 5 5 9 9 8 5 8 

Rank 6 1 7 7 2 2 4 7 4 


The values of m, Z m , u, v, X, Y on each successive trial are tabulated 
below 

m Z m u v X Y 

initially 0 — 00 -oo co 
0 7 5 3 7 co 

1 13 0 8 7 13 

4 9 1 6 7 9 

6 8 3 4 8 8 


Similarly, a sub-flow diagram for the adjustment of the cell fre¬ 
quencies and the marginal frequencies is required. All computers have 
a linear ordering of their registers and the required two-dimensional 
ordering must be converted to a one-dimensional one. Suppose the array 
is Nx M, then the ( i , /)th cell frequency is stored in the 
pth = ((/'— l)lV+j)th 

register of a group reserved for that purpose. Denote these registers by 
U . Denote the registers holding the marginal frequencies by V-, and Wj 
where i and j are the cell labels with upper values / and J and suppose 
the common cell widths are a and b respectively, then the cell counting 
is merely 

i’ = Xja (unrounded division) 

i' = min (i, I) 

j' = Y/b (unrounded division) 

j' = min 0*. J ) 

p f = (i — 1)jV+7 

V p =v,+ 1 

v; = v,+i 

W'j=Wj+ i 
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Store the theoretical marginal frequencies for the mean in registers L r 
Then the flow diagram for the adjusting process will be: 



Print Wj 


w j = vVj+Ljx Up 


i.' =0 



p' =p + N 


P' = j 





W'j =0 


IfinTshI 


Figure 19 


Although this flow diagram procedure may seem rather cumbersome, 
it is an essential step in preparing a computer programme and even if 
the calculation is to be done by hand, it ensures that the detail of the 
calculation is understood before a computing schedule is drawn up. 
For the automatic computer, every contingency must be considered, 
and this also saves time on manual calculation when an unusual situa¬ 
tion arises. 

These diagrams will be used extensively in later chapters, but generally 
the detailed sub-flow will be omitted. 

The second stage of cementing these diagrams into computer pro¬ 
grammes or of drawing up computing schedules will depend upon the 
equipment being used and will not be considered further. 



CHAPTER 9 


ESTIMATION 

An earlier chapter has dealt with the problem of determining a 
distribution by a sampling process. Often, the aim of the experiment is 
less ambitious and only the mean (and perhaps the standard deviation) 
is required. A classic example of this arises in the use of range as a 
measure of dispersion. The range is clearly a monotonic increasing 
function of the sample size as increasing the sample size increases the 
probability of obtaining at least one large deviation from the mean; 
what is the functional relationship between the mean range and the 
sample size ? 



[finis hi 


Figure 20 
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In fact, this relationship is known analytically, but several object 
lessons can be learned from an attempt to evaluate this relationship by 
sampling. The first crude method would simply consist of drawing 
many samples of size n, evaluating the range for each, and use their 
a verage as an estimate for the graph of expected range against sample 
size. This would be repeated for various n. A flow diagram of this 
process would be as shown in Figure 20 on page 100. 

The calculation of the range follows a similar scheme to that proposed 
for the calculation of the median. 

It now becomes clear that 
this process can be combined 
with the sampling process and 
thus eliminate the storage of the 
sample values. Maximum and 
minimum values are adjusted 
after each sample. This raises the 
possibility of using the partially 
complete sample for one value 
of n to give an estimate of range 
from a smaller sample. Such Figure 21 

estimates will be positively correlated with each other, but this will 
ensure that differences between mean ranges for different sample sizes 
are estimated with greater precision. The amount of sampling required 
is reduced to that for the largest sample size to be investigated. 

A little care is needed with the initial conditions since a sample of one 
cannot give a range. An outline of the calculation is given in Figure 22 
on p. 102. 

We now turn to improvements in the sampling procedure. Suppose the 
mean of the distribution being sampled is known to be the value /i. If the 
mean of the sample chosen is greater than zero, we will expect a larger 
maximum than in the contrary case. If we were concerned with estimating 
the maximum of the sample, then a more sensitive statistic to use is 
U = max Z p - I,Z p /n E( U) = £(max Z p ) ~ fi 

Our estimate is then U+n U has smaller sampling variance than 
max Z p and so greater precision is obtained for a given sample size. 

Similarly K + /i is a better estimate of min Z p where 

p 

V = min Zj.-ZZp/n 




102 


THE ART OF SIMULATION 



Unfortunately, the estimate of the range 

U - V = max Z p - min Z p = range Z p =W say 
p p 

eliminates the mean. 

However, we can use this control variate method with some other 
measure of the spread. If the standard deviation of the distribution is 
known to be a we can use the statistic 

Hr - + a (1) 

p \ n — 1 

as an estimate of the range from each sample where Z = ZZ p /n. The 
adjustment to the flow diagram is 

(i) add two instructions U’ = 0, V = 0 to box A 

(ii) replace box B by 
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We may improve on this method by noticing that 

W-k /z(z,-Z) 2 

V n~\ ka (2) 

is a valid estimate of the range for any k and thus k can be chosen to 

and^so he Sa , mp h hng variance - We squire minimal sampling variance 

variance § ° ® ° f '* * ^ which makes the 'sampling 

variance minimum, i.e. we choose k so that P S 

Y(W-kS-W^~kS) 2 

is minimised. We can use the method of least squares to estimate k 
This technique will be discussed later on in the chapter. 

It is important to note that any error in this estimate from the 

mercW H Cal ° P ^ a&Ct ^ VaIidity ° f 0Ur estimate 0) but 

merely decreases its efficiency. } 

reader 113 " 863 * 0 ““ fl ° W dia§ram to USe this P rocedu re are left to the 

If the distribution being sampled is symmetric, we may make use of 
the sample mean in another way. We have 

E(W) = E{W\Z > fi).Pr(Z > n) + E(W\2 g M ).P r (Z g /i) 

If, as usual, there is no lump probability at Z = the symmetry of the 
distribution of Z implies that of Z so that ^ 

Pr(Z > fi) = p r (Z = i 

in a ™ ended t0 measure 2 for each sample and to stratify 
the data into the two classes. If the number of samples for which 

Z < /ns « 1; and the number for which Z ^ /i is n 2 (n = n, +n ,) the best 
estimate to use is «i+n 2 ;tne best 


1 (L W £ W \ 


These two variance-reducing methods can be combined bv 
with the statistics (1) or (2). y 


using (3) 


If the range is found, as proposed, from the difference of X = min Z, 
and Y = max Z p , a more effective stratification is possible. ” 
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E(W) = E(W\X > p, Y > p).Pr(X > n, Y > p) 

+ E{W\X p)-Pr(X £ p,Y> p) 

+ E(W\X ^ p, Y ^ p).Pr(X ^ p, Y ^ p) 

Now if X > p and Y > p then Z p > p for all p and the probability of 
this is for sample size m. 

Similarly, 

Pr{X £ p, Y ^ p) = Pr{X > P, Y > p) = (i) m 
and hence Pr(X = 1 ~ ®“" 1 

If the number of samples in the classes is given by. 

X> p,Y> p «i 
X ^ p, Y > p n 2 

x ^ ju # r ^ p «3 


then the best estimate is 


S I ^ 


Lt. ”P ^ . r< n\m-V\ X£/i,Y>H 

■ MJhIM —i+{ 1 —(i) )■- 


I W, 


The advantage of this stratification procedure is that the chance vana- 
Sns of the sample frequencies from their theoretical values are 
eliminated and this reduces the total variability of the resulting estimate. 
The two stratifications can be combined. It is clear that 

Pr(X > p, Y > p, Z g p) = Pr i X = ° 

and by symmetry 

Pr(X^p,Y>p,Z< p) = Pr(X =S p, Y > p, Z > p) = 

Then the estimate used is 

( Y, w P Y w p\ 

fr J x>£?r>P + X£t‘,r£f‘ - 

Kl> { rh n i ’ 

I y w p Y W A 

If the sampling distribution of an auxiliary correlated statistic is 
known, then stratification with respect to this statistic can be used. Fo 
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S the distl * Ution of ^ is know « for a normal distribution so 
fon ! f the ex P ectedvalue of range of a sample from a normal distribu¬ 
tion is under investigation, we may use the formula 

E(W) = £ E(W\Sj)P{Sj) 

*' e S l IS division of the range (0, oo) and P(S,) is the prob¬ 

ability that ^ falls into E(W\ Sj ), the expected value o V ubjea to 
the sample falling in S, is estimated by the average of the sample 
W s whose associated s 2 do fall in S f . P 

In large samples, the numbers falling into the different strata will be 

TheTH 1 " 1 ^ y pr f °P 0rti0nal t0 the Probabilities associated with them 
The advantage of the stratification is that the chance variability of these 
numbers no longer affects the variability of the estimate. 

tW TT 1 ,’ 11 “ Sti ° P °f ibIe ’ Particularly if a fine stratification is used, 
that a stratum may fail to contain a sample at all. In this case the 
estimate of conditional expected values is no longer available 
In the analogous case, when estimating a frequency distribution using 
ontrol variable, an alternative estimate was used and this led to bias § 

for ?he C In S H U t lder dlSCUSS !° n ’ * Simi ' ar expedient can be used substituting 
for the indeterminate values the unconditional average. 

. if™®,? 1 " l h , 1S WiH als ° bias the estimate > and an alternative procedure 
desirable. The simplest expedient is to sample until every stratum has 
a representative (and the minimum sample size has been reached) This 
leads to variable sample size and the exact statistical analysis of the 
accuracy of estimate is considerably complicated by this fact. In prac¬ 
tice, for large samples, the probability of a sample size beyond the 

minimum is low and the effect of the variability of sample size is of no 
importance. 

However, it may be more economical in time and energy to fix the 
sample size and modify the sampling procedure so that the sample size 
m each stratum is predetermined. 

The sampling techniques illustrated in this example can be genera- 
hsed and subjected to a certain amount of mathematical analysis. 

e . ^ note sample values x u x 2 , ... , x m by the vector x whose 
p.d.f. is denoted by p(x), Usually this is of the form 

m 

p(x) = n Kxd 

i= 1 

but this is not necessary to the arguments that follow: 
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The statistic is a function of x denoted by Z(x). Again usually this is 
a symmetric function of * 2> ..but this fact is not utilised in 
the main part of the theory. 

The quantity to be estimated is the expected value of Z } i.e. 

Z = | z(x)p(x) dx 

where R is the total region for which p(x) is non zero. Usually this is a 
m-dimensional Euclidean space but can, for what follows, be any region. 

The stratification is represented by dividing R into k sub regions 
Rj(j =1,2,..., /c). 

We define 

p.= [ p(x) dx the probability of a sample falling in Rj 
J JRj 

p.(x) = p(x)IPj xsRj Ithe conditional p.d.f. of x given that it is 


otherwise I * n 


Zj = Z(x)pj(x) dx 


Z(z)p(x) dx 


the conditional expected value of Z 
given that x is in Rj 


Given n samples * 1( * 2 , .... x„ the corresponding values of Z are 
denoted by Z„ Z 2 , ..., Z„. For a stratified sample containing n, items 
in R i the values are written as Z tj (j = 1, 2, ..., n h i - 1, 2, ..., )• 
The simple estimate of Z is 


2, = ixz r 

n 


Now E(Zd = Z and its sampling variance is 

* I* __ 

<T 2 = (Z-Zf = (Z — Z) 2 jp(x) dx = Z 2 p(x) dx - Z 

J R J R 

The variance of Zj is similarly defined as 

<rj = (Zj- Zj? = f Z 2 pj(x) dx-Zj 
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Now 


Thus 


Pj« 


Rj 


Z 2 p(x ) dx—Zj 


(Z — Zj + Zj — Z) 2 p(x) dx 

R 


JJ 


= (Z-Zj) 2 p(x) dx+ 


(Zj — Z) 2 p{x) dx 

R 


=IPjW+( z r z ) 2 } 

j 

n n U j J 


The estimate from a stratified sample is 


since 



E(Z iJ ) = Z i 
E(Z 2 ) = '£p i Z t = Z 

i 


i nf t n t 


On a freely chosen sample n t will be a variable but in large samples 
ca npi and 


V(t 2 )^-'£p i af<^ = V(2 1 ) 
n ; n 


v(2 t )-v(2 2 ) = l£ Pi (z,-Z ) 2 

n i 

If the sample is restricted so that the number in each stratum is exactly 
npi the above formula becomes exact. However, if restricted sampling 
is possible a better choice of the n t is possible. 

We require to minimise subject to = n. The optimum 

n t 

values of n t can be found by the method of Lagrangian multipliers as 
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The variance of this estimate 2 3 say is 


since 


v@ 3 ) 



V * VPi ) n 

Pi{Oi-d ) 2 + a 2 


v(2 2 )-V(2 3 ) = -^ Pl (c i -a) 2 

n i 


Often a restricted sample can only be taken by a rejection process. In 
this case, of course, nothing can possibly be gained by throwing away 
sample values except the labour of calculating Z values for the samples. 
It is very rare for the sampling process and the test if it lies in any 
given region to be negligible work compared to the work required to 
calculate Z. 

However, if a scheme of restricted sampling not involving rejection 
can be devised then the optimum stratification of the sample is reward¬ 
ing. The exact value of stratification cannot readily be stated since 
nothing can be said about the relative magnitude of the components of 
the partition 

a 2 = XpXZj - Z ) 2 + Zpi(of -S ) 2 + a 2 


The possibility exists in theory but does not often materialise in practice 
that a stratification can be devised from which the Zj can be calculated, 
but the pj are unknown. 

The sampling procedure then consists of determining for each sample 
the value of j, j(i), say, and using the estimate 

F(2 4 ) = 1X^,0 

n * 

Now V(Z m ) = ^ Pi 2 2 -Z 2 

j 

V(Z 4 ) = 1 I PjiZj- Z) 2 
n j 

Another possibility is that both pj and Zj are known for some of the 
strata (if known for all the estimation problem is solved). Then restricted 

sampling in the remaining strata should be used and the quantity £ p^j 

__ u 

estimated and the final estimate obtained by adding J] p^Zj. (K and U 

K 



109 


estimation 

5 XT “ in whid ’ ? ' an<i ^ *" k »™ ««■-»™ 

*■—*- - 

A final possibility is to make a random choice whether a samnled 

r ~ ~ “s » s 

with protabiltt„, ^StSlni proSmty 

z is calculated, but otherwise not. J ^ accepted, 


Put 

Then E(Z) 
Use an estimate 


Z — Zjqj if x accepted. 
= 0 if not. 


Ip. 

J 


^ + 0(l-4 J .)j = £ p .z. 


Now 





i Qj 


If we choose <?,. to minimise this quantity we obviously take n ~ l 

3^33£S35S§ 

I pA*j + «1 - q,)} = £ Pj(l - <f))q j + 4 > 

The optimum solution for qj is 

q = min (1, W^f) 

ttTe^d h vTe S0 that tHe 6XPeCted am ° Mt ° f WOrk - sam P' e takes 

If the strata probabilities are known and a restrirtoa „ 

r sible, pra is "" PlUE ’’ ro - 

W Flck a strata with probability pj. 

(n) Choose to sample this strata with probability q. 

Sss Tz M ’ r f ” - « - 
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If this splitting process is used, it is best to divide the strata into two 
classes^The first class is sampled by an ordinary stratified procedure 
and the remainder treated by the splitting technique. 

The general motivation for the splitting technique is that the strata 
so treated have very low <r/s and a few samples from a stratum of t 
kind will estimate Z, adequately, and the work of restricting a sample 
to the region is high and so the return on work done is low. 

The splitting technique is often referred to as_Russian rodette. 

As an example, consider the evaluation of E(e ) where R x +y 
and (x^) are th; co-ordinates of a random point m a square of side 2 
with centre the origin. Take k = 2 and define 
R, as x 2 +y 2 ^ 1 
R 2 as x 2 +y 2 > 1 — 1 = y = 1 

n n - I-™ 

Pr = 4 4 

By symmetry we may restrict the sampling to the first quadrant. 

Lisina stratified unrestricted sampling the flow diagram rea 



Figure 23 


For restricted sampling from R1 we take x with p.d.f. 

-yfl — x 2 (distribution Rl) 

71 

y uniform in (0, VCT) (distribution SI) 

From R2 we take x with p.d.f. 
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(distribution R2) 

and y uniformly in 1) (distribution 52) 


One scheme takes 


“[i“] 


n? ~ 


l-~h 
4 


+ 1 


The one is added to n 2 to ensure a sample size n, since -n can never be 

4 

an integer. 



Figure 24 


,.J he . optl , mum sam P ,e division for a restricted sampling scheme is 
difficult to determine since the evaluation of a 1 and a 2 is a more difficult 
problem than that of determining E{Z) = E(e~ RA ). Rough bounds can 
be found 

Ini^t 1 > Z > e~ 1 


In R 


-1 


The ranges are the ratm 1- e -V ( e ->-c-) ^ 1 and if we assume 

tne standard deviations in the same ratio, we take 


Jh = 


(c-l) 


l + Je-2) 


n 0 = 


-n 


l + Jc-2) 


+ 1 


To apply Russian roulette note that 


Ini?! l^Z 2 ^e ~ 2 
In R 2 e~ 2 ^Z 2 ^e~ 16 


Thus the splitting is done on region R 2 . This is also advantageous since 
the sampling from R 2 is slightly more difficult. The estimation of 0 is 
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rather difficult. Assume a value 0-1 and a 10% reduction in sampling. 
Then substituting in the general formula gives 

_ 3-2-0-971 
qi ~ 3-6 —0-97t 

The practical difficulties of estimation of quantities such as Z/, <jj, <t>, 
etc., make the application of restricted and split samples of doubtful 
utility except in cases where a stratification giving some order of 
magnitude differences in aj can be easily made. 

The evaluation of the relevant parameters required to design an 
efficient sampling procedure raises an estimation problem. In principle, 
the procedure should proceed as a two-stage sampling method in which 
the primary purpose of the first stage is to enable a second stage to be 
planned so that the overall sample will have been selected on an 
optimum partition between the strata. 

Unfortunately the variability of an estimate of a variance is usually 
higher than that of a mean and so the secondary parameters can only 
be estimated from a sub-sample with very poor precision. If the appro¬ 
priate optimum allocation of sampling between the strata involves very 
low sampling rates for some strata the original sub-sample may contain 
more than required in order to estimate the secondary parameters. 

The ideal procedure would be to make provisional estimates of both 
primary and secondary parameters as each sample (or batch of samples) 
is taken and to evolve the design in the light of the best estimates 
available. 

This discussion has assumed the amount of work (represented by the 
size of sample taken) that is available is fixed and has sought to utilise 
this to minimise the variability of the estimate used. This is artificial and 
it might be more appropriate in many cases to fix the variance required 
and design the sampling scheme to minimise the sample size required. 
Formally these two problems are identical, but in the latter the absolute 
values of the secondary parameters are needed, whereas in the former 
relative values will often be sufficient. 

Thus an evolutionary design should terminate when a required 
accuracy has been achieved. Sequential estimating schemes of this kind 
using stratification have received very little attention in the literature, 
but the simple sequential unstratified sampling scheme has received 
considerable attention. 

Since any sequential scheme has a stopping rule which makes the 
sample size variable, the actual standard deviation of the resulting 
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estimate will be variable. A different definition of accuracy is required 
and a confidence interval of specified length and confidence is employed. 

Suppose that the mean of a normal distribution N(jx, a) is to be found 
by sampling. The mean of ^-observations will have a normal distribu¬ 
tion NQi, Define t„ by 


1 

V 2/1J 


e"** 2 dx = 1 


then the probability if the estimate of the mean is x 9 that the interval 
{x-t^GjsJn , x + t a elsln) 

does not contain the true mean fx is a. If we require the interval to be of 
length / this gives the equation for n 



If this is not an integer the next highest integer should be used. 

This assumes that a is known. Otherwise this must be estimated from 
the data. 

If a fixed sample is taken the appropriate confidence interval is 
found by using Student’s distribution. Let t a now be the critical t-value 
for enclosing a probability of 1 - a. Calculate 

t = (x-n)\fn 

s 

This gives a confidence interval 


*- 7 = *+-/= 

v n v n 

The analogous sequential procedure would at first sight take a rule 
to stop sampling when 


n > 


where S n is the estimated standard deviation on n observations and 
the critical /-value for Student’s test on n degrees of freedom. However, 
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this is not theoretically correct as the probability of stopping involves 
the successive events 

Si ^ 2(i/) 2 /4 1 > 

Si ^ 3 (i*) 2 / 4 2) 

s 2 £ n(il) 2 l^ 

Since the statistics S u S 2 , ..S„ are not independent, the probability 
of this joint event is not easily calculated and is certainly not 1 -oc as 
supposed by the confidence interval argument. However, for large 
values of n , Anscombe and others have shown that the effect of the 
sequential nature of the process can be accommodated to a sufficient 
accuracy for practical application by a simple adjustment. 

The recalculation of S 2 after taking each sample is troublesome and 
can be avoided by rewriting the stopping rule as a restriction on the sum 
of squares about the mean. 

This can be done most conveniently by means of the Helmholtz 
transformation. 

Suppose the sample is x u x 2 , •. x„ 

n rt—1 

Then £(*i-*) 2 =E«i 

i- 1 i= 1 

where u { — ~ ■< ix i+i — Y x i 

i(i + l)( j “ 1 

Thus each additional sample taken enables a new u t to be calculated 
and does not change the values of the previous w-values. 

The modified stopping rule in terms of these n-variables is found to be, 

2-676 -itl) 

i=i 4 tl 

when t is the normal critical level. The elimination of the Student 
distribution arises from the assumption of large n. 

n 

The derivation of this result hinges on the fact that £ x t has for 

i- 1 

large n a normal distribution N(n, V no). The central limit theorem 
ensures that this is approximately true whatever the distribution on an 
individual x 1 may be. Thus the stopping rule has a wider validity and 
may be used for all estimation processes so far discussed, 


i ~ 1 , 2 , ..., n-1 
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The expected sample size can be found to be 

4 c 2 t\ 1 + t\ 
l 2 2 

and the second term represents the additional sampling required to 
allow for the unknown value of a. 

If it is desired to quote a standard deviation for the estimate, the near 
normal distribution of the estimate can be used and the equivalent 
normal standard deviation is quoted 

a=-L 

2t a 

There is very little theory developed concerning the use of control 
variates to increase the precision of an estimate. 

If the control variate is 7 with known mean n and variance <n 2 and 
correlation p with z, the estimate is 


2 6 ~z~oty+ap 

= V(z~ay) — (<r 2 2<xp<jco -f- <x 2 <x) 2 )jn 
This is minimised by choosing a = 

co 

Then VOe) = e 2 (l-p*)/n 

The technique normally estimates a by least squares as 

XjVh-nyz 

Lyr-y 

i 

Now regard the set of sample values as a single multivariate sample. In 
repetition of sub-samples we are concerned with 


E{x(y-n)} = £ 


Z 0\—TlOh-ziZO.-iri' 

i _ i 

- nZ(y t -y) 2 J 


^0 


in general. 


Thus estimation of a from the data will introduce bias. This can be 
avoided by dividing the sample into two parts S 0 and S L containing «„ 
and n x samples respectively. 

Put $ k = 'Z(y,-y k )(z i -z k )ix(z i -z k ) 2 \ 

where y k = fyjn k z t = £ zjn k k = °> 1 

s « s k ) 
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Now « 0 is independent of z u y t and is independent of z 0 , To- Thus 
we may use the estimate 

s 7 = i[{z, - « 2 (y 1 - #0 } + -«i(T 2 - R)}] 

Each of 2,-oq-t (y,-fi)(i = °» 0 is an unbiased estimate and s0 their 
mean is also unbiased. 

Now 

from the theory of least squares. 

After a little tedious algebra we obtain 

1/(2,) = J2(r 2 (l-p 2 )^ + ^ + E(aiTi) £ ( a 2T2) 


= i<T 2 (l-p 2 ', - + - +t 


This is clearly minimised if n/2 and the variance becomes 

2 <r 2 (l -p 2 )/« 

The price paid to remove the bias is quite high. A generalisation of 
this technique due to Tukey helps. Divide the total sample into k equal 
parts and estimate a from each of the samples formed by deleting one 
part at a time. If the parts are S U S 2 , ■■■, *S k , define 

X (y,-yj ) 2 

j ieSj if:S J 

k v - k y z . 

y, = (k^)nh, yi ' ' 


__ k vp 

Ti = -1 y> 


k TT* 

= ~l Z i 


= 7 

k i 

Then to the same order of accuracy 

. k < 7 2 (1 -P 2 )^_nf f ' 


Clearly k cannot be made too large or the neglected term will become 
comparable with the main term. 
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The control variate technique involves more labour than simple 
sampling and there is a minimum correlation between y and z before the 
method is economic. Suppose the labour of the control variate tech¬ 
nique is X times that of the simple sampling procedure. In most cases 
X^2. 

The control variate technique is advantageous if 



2\X O 

Ph < - 
n n 


i.e. p > fejX i)+l 

V kX 

For the ideal case X — 2, k -*■ co the criterion becomes p > -L 

V 2 

Hammersley has suggested the use of negative correlation between two 
estimates as a method of reducing variation. Basically these techniques 
depend on the fact that if z x and z 2 are unbiased estimates of some 
parameter, then 

z = a x z x +a 2 z 2 (where a x +a 2 = 1) 
is also unbiased and the variance of z is given by 

U(z) = a\o\ + 2pa 1 a 2 o l c 2 + a 1 2 ol 
This is minimised by choosing a x and a 2 as 

a _ a 2 2 ~pa i (T 2 a = (T 2 -p<r 1 cr 2 

o 2 2 -2pg x g 2 + g\ 2 g\-2pg x g 2 + g\ 

With these values U(z) reduces to 

ti *1*2(1 -p 2 ) 

u opt\^) y -Z- ? 

G x ~2pG x G 2 + Gl 

It can be shown that this is a monotonic increasing function of p and 
so the larger the negative correlation between z u z 2 , the more accurate 
is the estimate z. 

The usual and most important case is g x = g 2 when as would be 
expected 

a t =a 2 = i U(z) = a ^(\ + p) 

In this antithetic procedure, for a gain in efficiency we shall require 
Xg\g\{X-p 2 ) < min {g 2 u cI)(g\-2pg x g 2 + g 2 2 ) 
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Assuming a 1 ^ <r 2 this reduces to 

kelp 2 —2po 1 o 1 + o i — (A— 1)<t 2 > 0 

This is true for all p < 0 if aja 2 ^ X-\. (The simple case a 1 = <x 2 , 

X — 2 satisfies this relationship.) If this is not so, then the equality can 
be reduced to a pair of linear inequalities. This gives the conditions 



The principal use of this technique has been with stratified sampling. 
If the strata are chosen so that p s = p const, then we may sample by 
taking a point in each stratum and repeating n/k times. 

Consider a univariate case. Suppose as is usual that for each /, pj(x) 
is non-zero throughout the region Rj which now reduces to a simple 
interval. The estimate for z i may be taken as z(x)pj(x) where * is 
uniform over the range of R P 

Thus writing/(x) for z(x)pj(x) and assuming that [ct p a J+1 ] is the 
range Rj our estimate is 

J = 1 

where {/ are uniform random variates. The normal stratified sampling 
procedure would take the independent. The antithetic variate 
technique consists of taking the variate negatively correlated. 

In the simplest case k — 2, the best possible relation is { 2 - 1 —£i 
which ensures a correlation of — 1 in the s but not of course in the 
two components of the estimate. 

For larger values of k , the choice of a relationship is more difficult 
and in practice the technique resolves into using a power of 2 for k. 

The whole process rapidly becomes similar in form to the ordinary 
methods of numerical analysis for evaluating the integral of a function 
as a weighted sum of ordinates. 

The application of antithetic variates to other situations will be 
discussed in a later chapter. 



CHAPTER 10 

SIMPLE QUEUEING PROBLEMS 

The sampling experiments discussed in earlier chapters have been 
applied to formal mathematical problems. We now turn to a study of 
more practical experiments concerned with industrial problems The 
major difference here is that the statistics of interest are no longer 
derived as complicated functions of probability distributions but are 
expressions characterising a probabilistic process. 

The simplest processes arising in industry concern queueing systems 
and in this chapter we illustrate various methods of conducting sampling 
experiments on such systems. 

A simple queueing system consists of a collection of items to be 
processed, and a mechanism for processing them. To specify the system 
we must know: 

( 1 ) the time taken by the serving mechanism (the server) to process 
a given item; 

(ii) the rules for the formation of the collection of items requiring 

service; & 

(iii) t ^ le ru * es f ' or selecting an item for service by the server from the 
collection. 

The time to service an item is usually variable and indeed it is this 
variability that raises the problem of the behaviour of the system It is 
natural to describe this variability by a probability distribution of 
service times for the items. This still leaves open the relation between 
the service time of successive items. In the absence of definite evidence 
that one service can affect the next, independence is assumed and the 
set of service times is taken on samples of independent items from the 
service time distribution . 

This assumes that the items serviced constitute a homogeneous Set. 

It is possible to generalise the situation and stratify the items and 
associate a different service time distribution with each sub-set. 

The collection of items requiring service varies with time and the rules 
for its change form the most convenient definition of the collection (the 
queue). After processing, items usually leave the queue. It remains to 
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specify the arrival of items to the queue. This is most conveniently done 
by specifying the times that items join the queue and, if there is strati¬ 
fication, what sub-set they belong to. These times do not usually form a 
regular sequence, i.e. the inter-arrival intervals vary. Once again, the 
most convenient description of these varying intervals is by a prob¬ 
ability distribution. The question of the relation between successive 
intervals also arises and in the face of ignorance about the relation, 
successive intervals are assumed independent. However, if we are 
dealing with a succession of queues, the intervals between arrivals at 
one queue may be determined by the output of a preceding process. 

Finally, the queue discipline must be specified. This consists of a rule 
for determining if the server is to serve an item and if so which item to 
serve. Usually, in the interests of high throughput, service is always 
given if there is an item in the queue to receive it. The choice of item 
usually depends on the order of arrival and the common rule is first 
come, first served. Special cases, such as railway wagons in a siding 
awaiting shunting, give rise to a last come, first served discipline. If the 
items are stratified, then the sub-sets may be priority-ordered with 
arrival time priority within sub-sets. In this case, the possibility o 
abandoning service of an item on arrival of a higher priority one arises, 
and then the rules for completion of the partially serviced item need 

P A11 but the simplest queue systems defy analytic solution, although 
the mathematical theory of queues is a powerful tool for discovering 

the kind of phenomenon that can arise. 

Once a complete description of the system is given it becomes pos- 
sible to simulate it by sampling the appropriate distributions for process 
and inter-arrival times, and performing the necessary book-keeping. 
Conversely, the attempt to describe the procedure for simulating the 
system is often a valuable way of disclosing gaps m the complete de¬ 
scription of the queue system. ■ 

We now turn to the problems of describing the results of simulating 
a queue system. The purpose of such a sampling study is to measure the 
congestion that the variable input and process time produce. The 
congestion can either be measured from the point of view of the system 
—how many items await service—or from that of the items—how long 
they wait. In the latter case, the natural description is to give the distri¬ 
bution of waiting times. The system description is more difficult as the 
number waiting at any instant depends on the number waiting at earlier 
times. Usually, the maximum queue that develops m a given period 
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the'items™* 361 ° f ' principal interest - space has to be provided for 

Suppose then that a simulation is required to study the maximum 
n g th of queue which develops in a given period, in a system consisting 
a single service with a given service time distribution (R, say) in- 
dependent of the item being served or of any history of service toes 
which is fed by a source of items arriving at intervals that have another 
g ven distribution (S, say). Suppose the system starts with an empty 
queue and an idle server. p y 

All that is required is a count of the queue size as items enter and leave 
it. These changes are caused by events and a history of the events must 
be generated. As each item arrives, it is possible to predict the time of 
arnval of the next item by drawing a sample from S. Similarly, as each 
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Figure 25 
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item starts processing, the time at which 
predicted by drawing a sample from R. 

At any stage, the next event to occur is 


it will be completed can be 
found from the minimum of 


these two predictions. . u r 

Let the current time that has been reached m the simulation be r 0 , 
the predicted next arrival T lt the predicted end of service T 2 and the 
specified duration T s . Let the current size of queue be n and use a 
marker / to denote if the server is active or not, set / - 0 if idle and 
if active. Let the maximum queue size to date be m. 

The flow diagram is then as shown in Figure 25. 

In practice, the calculation would be repeated many times so that the 
statistical behaviour of the maximum queue size could be studied, an 
this flow diagram would be a sub-flow of a complete sampling pro¬ 
cedure. For example, it may be proposed to accommodate a queue of 
size M. What is the frequency p of runs for which this is not large 


enough? 



Figure 26 


If in the practical problem M can be adjusted, it is more economic 
to form a frequency distribution for m and use this to estimate pfot 
each of a range of values of M. This same function can be used to 
estimate the necessary M to ensure that the frequency of overload of 
the queueing area is limited to any particular figure. 

The frequency being estimated is, by the nature of the problem a low 
one and consequently will have a high variability, and thus will need 
a large sample to estimate it accurately. What can be done to reduce 


the sampling required ? 

First in the sub-flow diagram, simulation can be stopped as soon as m 
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reaches the value M (as the maximum in the range under consideration! 
since m is a monotonic function of the duration. 

But a more powerful technique is to divide the total rhi™** • * 
two equal parts and perform \ set of simuSi^ ove! ££ 

recording both the maximum and the final queues These cases that 
have a large final queue are most likely to lead to iJf h 

queues in the second half of the duration aid a seconTs^ThaTf 
tjUration simulations can be performed, using different starting condi- 

final^r first simu!ations > estimates of frequencies p(ri) of different 
final queue lengths n are available and can be used to weight the condT 
tional distribution of maximum queue length in the sec^d duratimi 
In symbols, we take p(m, n\r) to be the frequency of the system reach 

: n a 8 „d zzg^sr of w - a ha,f — - 5£ 

dumtLn is tribUti ° n ° f maXimUm qUeUe Iength in the fl«t half 
Pi ( m » ~) X p(m, n|0) 

n 

and distribution of final length n is 

Pi(-,n) - £ p ( m> ,j|0) 

m 

If the duration is long compared with the individual process times 
and inter-arrival intervals, the distribution of the maxfmum ^Vn T 

second half is nearly independent of that of min the first half. 6 

P^ m > n) = Y, Pi( —, r)p(m, n\r) 

and p ,<».-)- Z H-,r) I »|,) - £ ,,(-, rMm , - | r) 

- -*■ — 

d is SL*' ( r"“Tr i " b i r p,ing fr ° m ,he 

maximum. and formin § the distribution of the 

noulmS?* iS .fi°! u Uite aCCUrate as the distribution p(m, n, r) is 

s^ . y ( ! heinitlal queue size r but also dependson 
of the server and the time of arrival of the first item. 
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In principle, the technique described for the single service queue of 
homogeneous items can be extended to several servers and a stratified 

PO For a ex°am°p f le! e suppose there is one server and two types of item; the 
server has a P different distribution of process times for each type of 
item. Assume that the first type of item has complete P™ nt ^ r * 
second type. There are at least two possible ways of describing the p 

t0 The first assumes that the items of the two kinds arrive independently 
and each has its own characteristic inter-arrival d^trAutions. The sec nj 
assumes that the items arrive with a given inter-arrival d'^ribut on and 
on arrival are categorised as type 1 or 2 by random choice, the first 
item consisting of p% of the input and each choice being ™ ade 
dependently of the previous choices. It can be shown that these two 
models only coincide if the inter-arrival distributions are exponents . 
Of course, both models are capable of several complications. 

Suppose we make the first assumption. A possible flow diagram 
for a simulation to find the maximum queue developing is given in 

F Tet e T 2 0 7 represent the simulation time, T x , T 2 the times of arrival of the 
next items of each kind; T, the time of end of service on the currently 
processing item by the server and T 4 the duration. Suppose «„ n 2 give 
current size of the queues of the two kinds, n the total queue and m 
maximum queue to date. Use a mark 1 to indicate the state of the server 
“empty; / = U serving first type of item; / = 2, serving second 
type of item. Let the two process distributions be Rl, R2 respectively 
and the two inter-arrival distributions be Sl,S2 respectively. 

The flow diagram based on this method will become even more com 
plicated if the types of item are more numerous; if there is more than 
one server the network becomes almost impossible to draw. 

However, there is a more systematic method of proceeding Suppose 
there are k types of item with priority of service given by then^ type 
number. Let the times of arrival of items be T„ i = 1, ■ • ■ • o ° eln &’ 
as usual, simulation time. Let there be * servers each independently 
drawing from the queue. Suppose in the event of two servers competing 
for an item to serve that a similar priority system based on server 
number applies. Let the times of completion of service by the server 
be T ,■ = i, ..s. Let the service time distribution of the ith 

server + serving the /th type of item be R(i, ])■ Suppose n t (i = 1, ..., k) 
give the current sizes of queues of each kind of item, n the total queue 
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Figure 27 


4 7 lne ma * Imum queue to date. Use marks L to denote the state 

z t rsurr' “““ to; ,he * - » r= 

event (service-end or item-arrival) which actually occurs a subdlow is 
entered. At the conclusion of each sub-flow the search is resumed 
The need to set the time of an idle server to a large value is 
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unnecessary, since in the time advancing search, only times greater than 
the present simulation time need be considered. 

The flow diagram is as follows: 



Figure 28 


If the utilisation of the servers is of interest, this can be obtained from 
the waiting times and the flow diagram is easily adjusted for this. 
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Suppose the waiting times of the servers are being stored in W (initially 
zero) then the order M 

w u +t 0 -t u 

added to the sub-flow marked (A) will collect the waiting times and then 
the final printing must be adjusted to print these quantities. We shall 
find that the sub-flow enclosed in the dotted area is of constant occur¬ 
rence and is abbreviated in later diagrams to ADVANCE T 0 . 

If only waiting times are required, we can proceed differently. First, 
consider a simple single service queue of one type of item. Suppose that 
the server has been waiting and has just received an item. Let the next 
item arrive at T l and let service be complete at T 2 . 

There will be waiting on the next item if T 1 > T 2 of duration T l — T 2 . 
Otherwise there will be no waiting and the second next item will arrive 
at T\ say, and the processing of the next item will finish at r 2 . The second 
next item will cause a wait only if T\ > T v 

In this way we can determine the next waiting time, when it occurs, 
and the number of the items n processed since the last delay. It is more 
convenient now to limit the sampling to a fixed number of waiting 
times (held in a register m, say). Suppose the inter-arrival and process 
time distributions are 7?1 and R2 respectively. Then the flow diagram 
reads: 



Figure 29 


This technique is an example of a recurrent event —a concept intro- 
duced by Feller. We study the queue process only at times wheij a 
specified event occurs; in this case the event is the queue becoming 
empty. We are able, for this special state of the queue, to describe the 





128 THE ART OF SIMULATION 

distribution of time between the occurrences of this state without 
studying all the intervening states. This arises because there is only one 
change of the system giving rise to the state of a zero queue. For all 
other queue lengths there are two ways to reach the length and the 
technique fails. 

In particular, it is not possible to adapt this technique to find intervals 
between the time when the queue reaches a fixed maximum. 

However, it is possible to generalise the waiting time formulation 
to deal with a stratified input. Suppose the items fall into /c classes and, 
as before, initially that one item of class i, say, has just arrived and has 
been seized by the waiting server. Let inter-arrival distributions of items 
of class i be Si and assume that they all have common service time 
distribution R. 



Figure 30 
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The sub-flow enclosed in dotted lines determines the minimum of a 
set and identifies an element attaining that lower bound. This process 
is so commonly required that the abbreviated expression 

O’, S) f = min (TJ 

will be used. 

If several values attain the lower bound, this routine gives the one 
of lowest index number. By changing the inequality to T g .S' the 
index given will be the highest of those eligible. This will be denoted by 

(S, 0' = min (T„) 

IV 

Similarly, if a set of tests W must be satisfied before an item is 
admissible as a candidate in the search we write 

0 , sy = min (T u )/W 

u 

For example if a value W a is associated with each machine and we 
require to determine the earliest-numbered machine with minimum 
value of T u for those with non-zero IF-values, we would write 

(i, S)' = min (S u )/W u ft 0 

M 

The flow diagram for this is 



If there is more than one test in W these follow the / separated by 
commas, e.g. 


(i, SY - min (T u )/W u * 0, = 0, P u > 0 

u 


A.S.— 5* 


J3Q THE ART OF SIMULATION 

A similar notation is used to determine the element maximising a 


quantity S 


(i, S)' - max (S H ) 


In the present example, 
minimisation is used. T is 


it is of no consequence which alternative 
advanced for each of any set of items arriving 


simultaneously, before S is advanced. . 1 f 

If the server has a different process time distribution for each type of 
item, a slightly different procedure is necessary. We shall assume as 
before that the index priority system applies. 



Figure 32 


In this and the preceding example, the initial conditions are not 
completely specified, as at the end of the waiting period the items of the 
other classes must have their return times fixed. In practice, arbitrary 
but reasonable values can be assigned and the first few results discarded 
until steady conditions are established. 

The problem of many servers to a single queue can be dealt with in 
a similar fashion. Now let T u be times of return of the server (u = 1, 
S), with process time distribution Ru , T the time of next arriva , 
inter-arrival distribution 5. Figure 33 gives a possible flow diagram. 

Finally, it is left as an exercise to the reader to develop the flow 
diagram for the completely general case of multiple servers of a stratified 

queue. 
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The analysis of queues has so far assumed an indefinite supply of 
items of process and this sampling has not depended on the number in 
the queue or the rate of processing. 

Often the items processed involve a carrying mechanism that is 
held in the queue until the item starts processing or in some cases until 
after the process is finished. We shall consider the latter case. An ex¬ 
ample of this is a coal unloading machine, which empties wagons of 
coal into a stock pile and is then released to collect more coal, the 
number of wagons in the queue is limited to the total number of wagons 
and the arrival of wagons cannot then be described by a statistical 
distribution of inter-arrival times. 

Suppose there are k wagons and each has a filling time after release 
by the emptying machine that can be described as a sample from a 
common filling distribution S . Suppose the emptying time is also 
described by a distribution R. 

Let the number of wagons in the queue be g, the time that the 
machine will finish unloading its present wagon (or finished unloading 
its last wagon if it is now idle) be T k+1 , the time that the zth wagon 
will arrive (or did arrive) at the queue is T h (i = 1, .. k) and finally 
the current time is T 0 and duration is T k+2 . 

At any instant, the next change of the system will occur when a 
wagon arrives or the machine finishes emptying a wagon. If it is empty- 
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ing a wagon let >S denote the number of the wagon. Let the queue 
discipline be first come, first served. 

Suppose the required information is the maximum size of the queue 
during the run and a list of the individual waiting times of the wagons. 
Then the flow diagram for the simulation is given by 



Figure 34 


The method can clearly be extended to several machines serving a 
common queue of wagons. The machines should be ordered in priority 
to deal with the possibility of two machines becoming available for 
service simultaneously. Similarly if two wagons join the queue together, 
the one of lowest number will be chosen for service. More complicated 
selection rules can be arranged by extending the programme. 

If only waiting times are required, then the recurrent event technique 
can be used. Suppose there are k wagons and m machines and the 
times of delivery of wagons are W h (/ = 1,2, ..., k) and the times they 
finish a service are If service is in process on wagon i put W t = 0 
(/ = 1, ..., m). The objective is to measure the first m waiting times of 

machines and the interval between waits. 

Suppose at time 0, a service has just started and all machines are 
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now serving, and there are no wagons in the queue, i.e. a wait by a 
machine has just finished. 

Then the next machine to finish service will wait unless a wagon 
delivery has been made before the end of its service; if it does not wait 
then the next possibility of a wait is if the second wagon has not arrived 
before the second service finishes. This process can be repeated until a 
wait occurs. 

Let the name of the wagon being serviced at zero time be held in V, 
then the programme reads 



Figure 35 

There is a similar solution to the problem of determining the waiting 
times of the wagons. This also starts from the condition where a service 
has just started and all items are busy and the queue is empty and we 
test that the next machine completes service before the next arrival. The 
details are left as an exercise to the reader. 

Another variant of the queue problem concerns a multiple service 
queue in which arrivals select a queue by some rule and can then only 
be served by the machine associated with that queue. As a simple 
example, suppose the queue of minimum length on arrival is joined, 
that all machines have a common process distribution R and that the 
inter-arrival intervals are given by the distribution S. 
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If there are k machines whose end of service times are T t and queues 
of length g,(/ = 1, .. k) then with T k+l for the next arrival time, 
T 0 and T k + 2 as before, we have a programme to determine the maxi¬ 
mum queue sizes M^i — 1, 2, ..., k) as follows: 



Figure 36 


The ordering of machines is important in this example as the adjust¬ 
ment of queue lengths must take place before any item arriving at the 
same time makes a selection of the queue to join. 

Finally, we consider the situation in which there is a set of queues 
fed serially from each other, the output of one forming the queue of the 
next. Input to the first machine is supposed to be described as usual by 
an inter-arrival distribution S and the machines have process time 
distributions R t (i = 1, 2, ..., k). It is convenient to change the notation 
and use Tfor current time, T 0 for the arrival events and T k+l for dura¬ 
tion. Set go to a negative value and put jRO for 5. A possible flow 
diagram is given in Figure 37. 
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There are, no doubt, many other queueing systems that could be 
described, but the foregoing has indicated the type of attack that 
will enable a simulation of such systems to be written. 

In the next chapter a different class of problem concerning queties 
is discussed and techniques for handling lists of customers are described. 




CHAPTER 11 


FURTHER QUEUE PROBLEMS 

The analyses of the previous chapter have all been system or server 
oriented. We now turn to a customer oriented analysis and derive the 
history of each customer in the system. In general, this is much more 
complicated and involves keeping lists for each item in the queue. In the 
other analyses, the queue discipline, apart from the priority rule, has 
not been invoked. In a customer analysis, the discipline becomes all 
important. 

However, the simple single server, unstratified, first come, first 
served queue system is amenable to a simple analysis. The relevant 
history of an item consists of the time of arrival and the times service 
starts and finishes. Let these be denoted by X n Y r , Z r respectively. 
Then we have the following recursive equations: 

X r+1 = X r + sample S 
Y r+1 = max (X r+1 , Z r ) 

Z r+1 = 7 r+1 + sample R 

where R and S are the service time and inter-arrival distributions 
respectively. 

In addition, the cumulative idle time W r of the server can be obtained 
from 

W r+1 = W r + max (0, X r+1 -Z r ) 

The stratified queue can also be dealt with in this fashion if there is 
no priority system and the description of the arrivals follow the second 
form suggested in the opening section of the last chapter. The equations 
simply become in an obvious notation: 

u = sample of item type 
X r+1 = X r +sample S u 
Y r+ i = max (X t+u Z r ) 

Z r + i = y r+1 + sample R u 
W r+l = W r + max (0, X r+i -Z e ) 
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For any other queue discipline, we must revert to the event type of 
approach. 

Consider the single server queue with a stratified queue and a 
queue discipline based on priority and first come, first served, within 
a stratum. As in the system-oriented case described earlier the times of 
arrival of items of the different kinds and the time of service are required. 
In addition, each item in the queue needs historical information about 
it storing in a list. 

The list will be arranged to accommodate a maximal number of 
items. This can be fixed either by guess-work or employing a system 
oriented simulation to determine the maximal number that will be in 
the queue and then using the same pseudo-random number starting 
values to ensure generating the same sample times. In the former case 
steps must be taken to stop the simulation if the guess proves wrong. 
The positions in the list are numbered 1,2, ..., m and each is used to 
store 

U s type of item 
X s time of arrival 
Y s time of service start 
Z s time of service end. 

If a position is not in use U s — 0 

The marker / associated with the server is now used to hold the list 
number s of the item under service. If / = 0 the server is idle. Using the 
remaining notation as before, we can draw the flow diagram to print 
the history of each item when service is complete, as in Figure 38. 

The sub-flow enclosed in dotted lines occurs frequently and can be 
abbreviated to 

LIST ENTRY (U, S)’ = («, T 0 ) 

This technique has the disadvantage that the whole list is searched 
each time an item is required, and the items could have been stored in 
separate sub-lists on entry. This raises the problem of the necessary 
length for each sub-list and the tendency to use more storage to ensure 
that no sub-list overflows. This extra storage required is usually an 
overriding objection to the multiple list arrangement. 

The alternative left is to sort the list into a priority order. For this 
to be effective, extensive book-keeping of the break points in the lists 
becomes necessary. We can arrange for each section to contain the items 
in any order or we can maintain a strict ordering by time of arrival. We 
consider the latter scheme first. 
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obtained from a selected item in the list. The process 

v - u„ X' = x„ r = Y s , r = z s 

is abbreviated to ‘READ s\ The reverse process is abbreviated to 
‘ENTER s\ 

Withdrawing a wth type item from such an ordered list is merely 

s' = M„ 

M' U = M U +1 
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READ 5 
U' s = 0 

For entering an item, a second temporary register is desirable. Read¬ 
ing and writing to this are abbreviated to READ 5 and ENTER s. It 
is also necessary to interchange the contents of the registers, which is 
abbreviated to SWITCH. 

The process of reading an item from one class creates a space in the 
sub-list of the next higher class. The first step in entering an item is to 
see if such a space (or spaces) exist(s). Otherwise the items in the list 
must be moved down until the last displaced item can be entered in an 
empty position. This may not happen before the bottom of the list 
is reached, but there may still be room in an earlier part of the list. 
Thus it is convenient to regard the list as circular, the last position 
being followed by the first. In this displacing process the registers M u 
must be adjusted. 

A possible flow diagram is given below: 



An alternative method of proceeding in this case of ordered sub-lists 
is to divide the available storage into units and use ordered aggregates 
of these to form a sub-list. As the items in any sub-list increase, unused 
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units are added to the aggregate and as items are used, emptying a unit, 
this is released for use by other sub-lists. 

This method is of value only if the total number of items likely to 
be stored is very large, since, if there is to be little wastage, each unit 
must be small compared to the total storage, and as each unit requires 
space to describe how it is being used, each unit must be large compared 
to the number of variables used to describe its use. 

There are two variants of the method. The first regards the units as 
items in a set of ordered sub-lists. To enter an item, the appropriate sub¬ 
list of units is read to determine the last item in the sub-list. This gives 
the name of the unit to enter the item. This unit is scanned and, if any 
space is left, the item inserted in the next space in the unit. Otherwise the 
sub-list of empty units is consulted to find an empty unit and the item 
inserted into the leading space; at the same time the unit is deleted from 
the sub-list of empty items and entered on the sub-list of units allotted 
to this appropriate category. 

To remove an item from the list, the appropriate sub-list of units is 
consulted and the leading item noted. This gives the name of the unit 
whose leading item is the required item. The item is removed from the 
unit and if this is now empty, the leading item in the sub-list of units is 
removed and entered as the last number in the sub-list of empty units. 

A flexible storage arrangement could be used for the sub-lists, but 
the complexity of the process in this case would be unlikely to be worth 
the storage saved, bearing in mind that the orders to execute the arith¬ 
metic of the process also take storage space. Using fixed length sub¬ 
lists for the names of units, the storage arrangement is illustrated 
diagrammatically in Figure 40. 

The unit lists are on the left giving for each of three types of item the 
unit number and an associated serial number to give order of entry. 
The units holding the items are displayed on the right and the chains 
linking them into a complete list are shown. 

To enter an item in type 1 sub-list, the units sub-list is scanned for 
maximum serial number; in this case 8. Unit 20 is given as the unit 
containing the tail of the list. This unit is now scanned to find the first 
empty cell; in this case cell 4. 

If an item is to be entered in the type 2 sub-list the unit 18 is selected 
since its serial number is 4, the maximum in that sub-list. On scanning 
this sub-list it is found to be full. The first non-empty cell of the empty 
units list gives unit 17. The item is entered in the first cell of this unit 
and the first empty cell in the type 2 sub-list is found. This is cell 6, 
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in which we enter the pair (17, 5). The cell containing 17 in the empty 
sub-list is cleared. 

To remove an item, the unit sub-list is searched for minimum serial 
number. For a type 3 item this gives 10, corresponding to unit 9. 
Unit 9 is searched for the first non-zero item, and this is found to be last. 
Removing this item leaves unit 9 empty. Thus the cell serial number 10 
is cleared and an empty cell in the empty list is found and 9 inserted. 

In practice, the scan of the units can be avoided by keeping a record 
of the position of the head and tail of each list. Let there be k items in 
each unit list with entries (U h S t ) i = 1, ..., k; h, t, the ordinal number 
of the head and tail of the list. Each variable requires a further suffix j 
to denote the sub-list number, but this will be omitted without loss of 
generality. The empty list is unordered and is denoted by (K f ). 

The item in position m of unit / is denoted by I Um . Let X, x, ij be work¬ 
ing spaces. The flow diagram for an entry of an item / is then given by: 



Similarly for a withdrawal we have: 



Figure 42 
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We have L[$ = (21, 17) L' 17 = (18, 0) L 6 — (0, 24) 

(i, ft, j, <)' = (23, 2, 17,1) (X, y)' = (26,13) 

For the removal of a type 3 item (/, hj, t ) = (9, 9, 14, 1) 
i'n = (1» 9) 14,(13,0) L' i0 = (0,12) 

(i, />, j, t)' = (10, 1, 14,1) (x, y)’ = (6, 9) 

This chain method is superior to the list of units method m the 
amount of storage involved and the amount of calculation involved, 
and will well repay study to understand the mechanism. 

It may be used to deal with items if the amount of storage for each 
item required is high. Each item now has a link store and the 4-part 
cell reduces to a two-part one as for the empty units chain. In fact, the 
units of the former analysis are regarded as items. The simplification of 

the calculation is left to the reader. . 

We now turn to the problem of maintaining unordered sub-lists in a 
common list. This arises if the criterion for selecting an item from a 
sub-list is different from the first come, first served basis. For example, 
the item selected may be that with the earliest delivery promise. 

Reverting to the notation of the first sub-list technique (p. 138) the 
selection of an item of type u is achieved by a scan of the sub-list where 
the start and finish cell names are M u and M u+1 -1 respectively. 

To enter an item / of type u we first see if it can be inserted at the end 
of the present sub-list. If not, it is entered in place of the head of the 
next sub-list (w+1) and the process repeated to insert the displaced 
item into sub-list (u+ 1). This continues until the item can be inserted 
or the whole list has been scanned. In this latter case an alarm is given. 



Figure 44 
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It is possible to allow free use of the list for storing items of all types 
and at each scan to record the first and last items of the type being 
selected and to use these values to determine the range of next scan. 
The attempt to enter a new item should start at the first item of the same 
kind and if an empty cell is not found before the last item of that kind, 
the last item record is adjusted. This assumes a circular list. 

However, this method involves the storage of two references for each 
sub-list and will always involve larger scans than the first method. 

If the storage capacity allows a second record for each sub-list, a 
better procedure is to record the location of the last item in each §ub- 
list in N ui say. 

Then the insertion of an item of type u reduces to: 



These techniques for handling lists can also be used to deal with 
items passing through a queueing system. The sub-lists then correspond 
to items in the different queues. By rearranging the items, searches for 
them when service can be given can be made over a reduced range. 

Lists also arise in the study of other industrial simulations. This forms 
the subject of the next chapter. 
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GENERAL SIMULATION PROBLEMS 


Since the popularisation of queue theory, it has become common 
practice to describe an industrial plant as an aggregate of inter-con¬ 
nected queues. The theory that gave birth to this view of a plant is not, 
however, capable of offering a solution to many of the problems of 
industrial congestion and simulation is frequently used to study such 
problems. Earlier chapters have shown various techniques for simu¬ 
lating queue systems of varying complexity, and the extension to more 
complex systems can be envisaged. 

However, the queue description does not meet all industrial situa¬ 
tions. The most important exception is when machines undergo cycles 
of activities and co-operate in some (but not all) of their phases with 
other cyclic machines. These, in turn, co-operate with each other in 
other phases and so on. The time taken in the various phases are ran¬ 
dom variables and so machines do not always reach the co-operating 
phases together. This induces a delay in one of the machines. The whole 
situation can be likened to a set of stochastic gear wheels clicking round 
irregularly. 

As a simple but illuminating example, consider an electric steel¬ 
making furnace. Such a furnace goes through a cycle of five phases: 
first, it is charged with pig iron and steel scrap; second, this is melted; 
then it is refined until the content of trace elements such as carbon have 
reached the required level; the resulting steel is poured (or tapped) 
into a ladle, and finally the furnace lining is repaired by charging a 
refractory material which replaces that lost by chemical erosion during 
the cycle. This last phase is known as ‘fettling’. 

The ladle used for tapping is held by a crane which removes it when 
full to a ‘teeming’ platform where it is poured into moulds. These 
moulds are mounted on a railway bogie which, after standing, is hauled 
by a locomotive to a stripping bay where the moulds are removed from 
the now solid steel. After stripping, the locomotive returns the empty 
moulds for refilling. Similarly, the ladle crane takes the empty ladle for 
cleaning before it is used for tapping again. 

In practice, the basket, crane and loco cycles are fast enough to ensure 
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that very little interference takes place, but often these machines serve 
several furnaces, when the congestion problem can arise. It is instructive 
to imagine that the various cycle times are similar so that the technique 
for simulating such a system can be developed. The complication of 
several furnaces and possibly several cranes and locos can be introduce 

The process can be described in the diagrammatic form of Figure 46. 
Each machine has a set of states corresponding to its various phases 
and distributions to describe the time spent in these phases. We mus 


now introduce some notation. 

Let the machines be numbered 1 for the basket, 2 for the furnace, 
3 for the ladle and 4 for the loco. Let T t be the time the ith machine 
will finish its current phase (or if it is idle the time it finished its last 
phase); S, be the state of the machine (or if it is idle its last state). 
TsPnmher the states as follows: 


Basket Furnace 

Charging 0 Charging 0 

Filling 1 Melting and 

Refining 1 

Tapping 2 

Fettling 3 


Ladle Loco 

Teeming 0 Teeming 0 

Cleaning 1 Hauling full 1 

Tapping 2 Stripping 2 

Hauling empty 3 


The activities of melting and refining can be coalesced as the furnace 
moves from the first to the second state without invoking any other 
machine. This also avoids the difficulty that the refining time may be 
partly dependent on the melting time. . 

Let the time distribution of the ith machine in the/th state be ; 
T 0 be the current time, T 5 be the duration. Let S 5 be a dummy vanab e 
always taking the value 0. Suppose we wish to measure the times be¬ 
tween successive casts of steel being stripped. 

Then the programme must arrange to change the state and time asso¬ 
ciated with each machine as it reaches the end of each phase. The activi¬ 
ties of charging, tapping and teeming will demand special treatment as 
in each case two machines must have completed their preceding phases. 
The first attempt at the flow diagram might appear as shown m 

Figure 47. .. . 

This diagram is seen to be very repetitive but not quite systematic, 

some of the exits to the ‘ct /’ routine do calculations, others do not. 
However, it is possible to determine for each machine as its state is 
changed whether or not a state change is automatic on leaving that state. 


0 


— UP| 



Figure 47 
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Thus, for example, as the basket enters the state of charging, it can 
be predicted that, at the end of charging, filling will take place. 

The required systematisation can be achieved by introducing some 
more constants. Let C, be the cycle length of machine / and define b i} 
as one if machine i changes to its next state on leaving the state /; 
and zero otherwise. For machine 5, put b 5 ^ 0 = 2. Introduce variables 
B t to hold the current values of b tj for each machine. 



Figure 48 


As a second problem, consider the problem of simulating the passage 
of traffic across time-sequenced traffic lights at a junction. 

Since the example is for illustrative purposes, we will assume that 
there is no entry to the side road so that we do not need to consider the 
blocking effect of traffic turning right into the side road. The lights go 
through a cycle first allowing the main traffic to flow (state 0), halting 
all traffic (state 1), allowing the side road traffic to enter the mam road 
(state 3) and finally halting all traffic again before re-startmg the cycle. 
Assume that the times in these states are fixed at t 0 , t u t 2i t 3 respectively. 

The rules for describing the passage of traffic across the junction 
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must allow for the fact that vehicles which cross from rest will take 
longer than those which arrive when there is no queue at the lights. 

The proposed rule divides the passage into two parts: in the first the 
vehicle moves from the queue into the junction and then in the second 
actually crosses. As soon as one vehicle has moved into the junction 
another can start to move. In this way a continuous stream of traffic 
across the junction is achieved. If a vehicle arrives when there is no 
queue and it can cross, the first phase takes place instantaneously 
Otherwise, the time a vehicle spends in the first phase is described by a 
distribution R, common for each of the three streams of traffic. The 
time m the second phase cannot be so described since the vehicles must 
leave the junction in order of arrival at the queue. To achieve this a 
fixed time r 4 will be used. Vehicles arrive at the traffic streams with 
inter-arrival times given by distributions S u S 2 , S 3 . 

., I 116 ,™ 16 f °, r the passage of traffic is simply that it can cross if and only 
if the lights allow and there is no vehicle on the junction. 

Let the current time be T 0 , the current (or immediate past) first 
phase crossings for the three streams end at times T t , T, T, and the 
next arrivals of vehicles be at T 4 , T s , T„. Finally, let T, be time of the 
next change of the light, T a the time the crossing will be clear and T a 
the duration. Let S denote the state of the lights and U u U 2 , U 3 be the 
variables associated with the three streams of traffic taking the value 
one if the traffic stream can pass and zero otherwise. Let the queues in 
the three streams be 0,, Q 2 , Q 3 . Stream 3 is chosen as the side road 
traffic and the purpose of the simulation is to measure the maximum 
queue developing in each stream. 

A possible scheme is shown in the following flow diagram (Fig. 49). 
There are some points of special interest in this example. The three 
times T u T 2 , T 3 are reset each time a new vehicle is taken from the 

qU ,n e l ‘ he ;° utine labelled A - When the W ^pties, these times 
will lag behind T 0 and the routine A will not be entered again Vehicles 

can never start moving from a queue by A ; this is achieved by routines B 
and C which switch the V variables to allow traffic to move at the later 
of 0) a favourable light change and (ii) the junction becoming clear of 
traffic and, if there are vehicles queueing, resets T,, T 2 or T 3 and causes 
the junction to become busy with the first vehicle crossing 
Vehicles which arrive when there is no queue proceed straight to the 
junction and only the junction clear time needs adjustment. 

The adjustment of the U variable is most conveniently done as shown 
since if it is transferred to the right-hand side, two versions are necessary,’ 
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Figure 49 


one to adjust them when the lights become favourable if T$ ^ T 0 and 
the other when the junction clears (T s = T 0 ), if the lights are favour- 

able. , . . . u 

In this case the present numbering of the times is essential since the 
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adjustments to T 8 must be made prior to changing the U\ This 
that‘tad end Charlies^ do not get hit by a newtSt' " “ 

P»~ssriss2: r as? 

™ "s 

A pnonty rule must be given for the crane in the event of competition 
This , s supply that handling will take precedence over unloadfng The 
crane always reloads a machine as its next job after unloading ft We 

crarVhe 6 TT ° f ^ StUdy “ to e ^ate the idle time of I 
crane. The whole process is illustrated below for one machine. 



Figure 50 


We assign variables as follows: 

T 0 current time. 

J phased T " the dmeS the maCMneS finish (or finished > their current 

T 4 the time the crane finishes its current job 
T 5 duration. 

the states of the machines. 




154 


THE ART OF SIMULATION 

5 = 1 Loading 
5 = 2 First Processing 
5 = 3 Handling 
5 = 4 Second Processing 
5 = 5 Unloading. 

5 4 the name of machine which the crane is currently serving. 

W the crane waiting time. 

B u B 2 , £ 3 , B s organisational storage 

Put B 5 = 4. 

The flow diagram for a simulation is given below. 
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Here the use of the organisational variables has been taken a stage 
further and used systematically to ensure that the correct adjustments 
are made as each event occurs. 

The searches for the machine to service when it is known that a crane 
is available deserves mention. If no machines satisfy the side conditions 
the cells t, T will not be set and the alternative exit is taken to by-pass 
the adjustments corresponding to the handling or unloading activities. 
It the first search is successful it is clear that the second cannot be so 
since T 4 is now greater than T 0 . 

Looking at these three very different examples as well as the later 
examples of queue systems, it can be seen that a general pattern emerges 
from them all. This can be expressed in the following flow diagram: 



The difference of roles played by the numbered and lettered sub-flows 
deserves comment. A numbered sub-flow is executed at a fixed time, 
i.e. immediately T 0 reaches a time associated with an event. If the sub- 
ow is prefaced by tests which fail, the executive part of the sub-flow 
will not be entered at any other time until some other sub-flow changes 
a time or 6-number. The purpose of the lettered sub-flows is to just so 
reinitiate the numbered sub-flows and to start activities which involve 
the occurrence of otherwise independent events. Examples of the use of 
lettered activities are: 

( 1 ) processing an item on a machine involves both an idle machine 
and a non-empty queue, 

(ii) handling a piece involves both the machine having reached the 
stage where handling is required and that a crane is available. 
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From this general flow diagram and the examples which led to it, 
certain useful concepts can be defined which help in the description of 
flow charts. First, with every time variable, we associate a machine. 
Often this machine has a real existence, but sometimes an imaginary 
machine must be invented. Thus in the road traffic example there are 
imaginary vehicle loaders which take vehicles from a queue into the 
junction and an imaginary machine to measure the occupancy of the 
junction. 

The times associated with these machines define events of some kind, 
and often we associate this with a change of state of the machine. We 
are thus led to the concept of machine-states, which can be an arbitrary 
assignment of numbers to the different conditions of the machine. The 
phases of the furnace of the first example constitute a typical example. 
There is no restriction to a single variable for the specification of the 
state of a machine. 

When a machine i is made to start some activity, its state takes on a 
new number /, say, and the time T t is given by adding the process time 
to T 0 . The pair of instructions 

T\ = T 0 + f 

} 

are spoken of as committing machine i to state ./ for time /; this latter 
is usually provided by a sampling process, but this is not necessarily so, 
e.g. the traffic light machine of the junction example. The test that a 
machine is committed is that 7) > T 0 . If this is not so (i.e. T t 5= 7" 0 ) 
the last activity has been completed and the machine is idle, or to use a 
less flavoured word, is available. 

A machine which has only one job, e.g. a server in a queue, need not 
have a state variable associated with it since the working state is equi¬ 
valent to being committed and the idle state is equivalent to being 
available. 

The various sub-flows adjust the states and times of machines to 
correspond to various activities on the plant and can conveniently be 
described as activities. The two types must be distinguished and the 
notation ^-activities and C-activities is introduced. C-activities are 
conditional ones describing co-operative activities, ^-activities are 
performed immediately a machine becomes available and often contain 
book-keeping instructions (hence B for book-keeping). The instruction 
to obey such a ^-activity on becoming available is associated with a 
machine and the machine is said to be engaged to the activity. Several 
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machines may at any time be engaged to the same activity, in which 
case the activity usually involves the machine name as a variable index. 
A variable index b { is associated with each machine to hold the name of 
the 5-activity required. By convention zero will be used for the null 
activity. 

With this structure understood the specification of a simulation 
becomes the specification of the machines, their states and initial condi¬ 
tions and the individual activities. Thus the simple single server queue 
simulation becomes in an obvious notation: 


Machine 

Name 

States 

1 

Arrival machine 

— 

2 

Server 

Q the size of the queue 

M maximum size of queue 

Initial conditions T x 

I! 

Ctf 

1! 

-sT 

II 

M = 0, T 2 = 0, b 2 = 0. 


51. Item arrives 


Increase queue by one. 

If queue now exceeds maximum adjust the latter. 


|q 7 =q+i1 


(Q)M> 


~M^ol 


Commit the arrival machine for another 
fetch-and-carry time. 

Engage to 51. 


_i_ 


V=T 0 +SAM R 
b/ = 1 




Figure 53 


In practice, since the engagement 
51 is not disturbed this does not require 
renewal in 51 and may be omitted. 


Cl. Item served 



If there is a real queue and the server is available. (§7§)—*— 

HE3U 


—X— 

-HExrri 

Commit the server for a service and reduce 

T 

V 2 =T 0 +SAM S 


the queue by one. 

Q'=Q-1 

-*{EXTT| 


Figure 54 


It is tempting to represent every real machine in a plant by a time- 
state pair of variables, but this is not always necessary or desirable. 
Each machine so introduced lengthens the process for determining the 
time of the next event and if a group of service machines are involved, 
all offering similar service, a C-activity will make a search over these 
machines to find one available. 
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To illustrate the point, suppose we are concerned with a set of nine 
lorries operating a cycle of arriving at a warehouse, waiting to be loaded 
by one of the five fork-lift trucks, travelling to a site and returning 
empty. The obvious procedure to determine the lost time of lorries 
waiting to be loaded is as follows: 

Machine Name States 

1-9 Lorries S, 0 loading, 1 travelling 

10-14 Trucks — 

Initial conditions — FF(total waiting time) = 0. 


51. Lorry Journeys 

Commit lorry to a journey. 
[Mark lorry as travelling.] 
Disengage lorry from B \. 


Cl. Loading 

Set to first crane and lorry. 

Find a lorry available [which has finished 
travelling]. 

Find a crane which is available. 

Record waiting time of lorry. 

Commit crane and lorry for loading 
time. 

[Mark lorry as loading.] 

Engage lorry to 51. 

Return for other lorries. 


E' =T 0 +SAM R 
(journey time 
distribution) 

bi'-o 



In Cl, the return to the ‘ct /’ instruction after committing the loaded 
lorry and crane is to ensure that all idle cranes and returned lorries are 
mated at the proper time. 

It is not necessary in fact to use a state for the lorries since the only 
use for this is in the first test in Cl. If a lorry is available at the time this 
activity is executed, it must be waiting for a load since loaded lorries 
are dispatched by 51 automatically before Cl is reached. Thus the 
expressions in square brackets can be omitted without loss. 

However, a greater economy can be achieved by noting that since 
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lie I™! 8 /°I. 0ne PUrp ? Se ’ il sufflces t0 k “P a count of 


the trucks available. This gives a simulation as follows: 


Machine 


Waiting time W 

Lorries Available trucks Q 


Initial value 
0 
5 


Increase count of trucks available. Commit lorry 
for a journey time. Disengage from Bl. 

Cl 

Set to first lorry. 



If there is at least one truck available, 
find a lorry which is available. 

Record the waiting time. 

Reduce count of trucks available. 
Commit lorry for a loading time. 
Engage lorry to B 1. 



=1™”^?"^““ ‘"'“I * ” ,h *' ,he ‘"* k ■>* 1“ only one 

wil, ,nfce “ md,Ca “” ““ “ f «•» «*■ 



Figure 57 


Similar y the type of activity required (B or C) is revealed by the 
lagram. Loading must be a C-activity since it involves the co-operation 
of machines (even if these are represented by counts) but travelling can 
be a 5-activity as it occurs unconditionally after loading. 

We now consider two further examples. The first is a simple queueing 
process. Items join a queue with a given distribution pf inter-arrival 
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times to await processing by a machine. The items are dealt with singly 
and the process times are independent samples from a given distabutam. 
After processing the items enter a stock. The stock is dispatched 
regular intervals in batches of m items. If the stock is less than m items, 

n °WritT 1°simulaticm to determine the maximum queue which develops, 
the "die timTon the machine and the number of missed drspatches 
given 100 items in stock initially, an empty queue, an idle machine a 
dispatch interval of 50 units, a duration of 500 units, and a bate r si 

35. 

Make an allocation as follows: 


Machine Name 

1 Arrival 

machine 

2 Process 

machine 


Q queue 


Initial Conditions 

T x = 1 m/c arrives 

immediately 
7\ = 0 m/c idle 


iuaciniiv g _ q 

U cumulative idle 

time U = 0 

M maximum queue M=0 
3 Dispatch S items in stock T 3 = 1 

machine s = 100 

W missed dispatch 

count JV = 0 

Also set initially b x = 1, b 2 = 0, b 3 = 3, b A = 4. 

Use distributions Rl and R2 for inter-arrival times and process times 

c r*t i VP.1 V. 


51. Arrivals 

When the arrival machine returns, sample 
a fetch-and-carry time, commit the machine 
to fetch-and-carry, increase the queue of 
items at the process machine by one and if 
the queue is now larger than the current 
maximum, adjust the latter. The initial 
engagement to 51 still holds and this 
machine repeats this cycle indefinitely. 



Figure 58 
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B2. Increase Stock 

When an item finishes being processed on the 
process-machine, increase the stock by one 
item. The machine remains available until 
C2 recommits and engages it. 
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S'=S+1 


B3. Dispatch 

When the dispatcher returns for the next 
delivery, test if there are at least 35 items in 
stock and, if so, reduce stock by 35, and 
commit the dispatcher until the next dispatch 
time. Otherwise do nothing. Cl will then 
count the delay and dispatch the dispatcher. 



- MexHH 


S'-S -35 


HESS 


B4. Printing 

When the printing machine arrives at the PRINT M 

end of the run, print the results. PRINT u 

PRINT W 

Figure 59 


Cl. Delays in Dispatch 

If the dispatch machine is available (which 
can only happen if B3 failed to dispatch 35 
items), count a delay and commit the dis¬ 
patcher until the next dispatch time. 


4Txm 


V =V50 
W'=W+1 



C2. Reduce Queue 

If the process machine is available and 
there is a queue of items for it, sample a 
process time for next item, add idle time to 
the cumulative total, remove one item from 
the queue and commit the machine to pro¬ 
cess the item. 


- 

*|EXIT| 

(Q>o)—x- 

♦IexitI 

U' = u+T 0 -T 2 


Q'=Q-1 

—ExjH 

V s V SAM R2 


Figure 60 


The ordering of the machines is relevant to the success of this model. 
B2 is entered before B3, since the machines involved are in that order. 
The stock is correctly increased before the test for 35 items is made. 

A.S.— 6 * 
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The second example concerns a circular bus route which is served by 
8 buses each able to carry up to 56 passengers. There are 12 bus stops 
around the route, and the length of each passenger journey is an 
independent sample from one distribution of journey lengths. Pas¬ 
sengers join queues awaiting buses at each stop with inter-arrival times 
which are independent samples from distributions particular to each 
stop. The length of time any bus takes to travel from one stop to 
another is an independent sample from a distribution of stop-to-stop 
times. 

The disembarkation and embarkation times for passengers at each 
stop are calculated as 5 units for the first passenger and an extra unit 
for each successive passenger. 

Write a simulation to determine the maximum queue which is likely 
to develop at each bus stop. 


Machine 

Name 


1-8 

Buses 

Ui the next stop for the bus i 

p. the number of passengers on the bus i 

Q { the number of passengers joining the bus 
i at the current stop 

9-21 

Stops 

Q t actual queue at the stop (/-8) 


(Passenger 

Mi maximum queue at the stop (i- 8) 


arrival machine) 

jP (r>s) number of people on the rth bus due 
to alight at the sth stop 

W working space 


Let the inter-arrival distribution for fth stop be R t and the travel 
time distribution to stop j be Sj. Let the journey distance distribution 
be J. 

Engage all stops to B 1 (b 9 = b l0 = . .. = b 21 = 1) and all travelling 
buses to B2, Enter B3 on duration. 

B \. Bus Stop Queue 

Add passenger to queue. Reset loading time 
for first passenger. Commit arrival machine 
to next arrival. 


Adjust maximum queue if necessary. 



Figure 61 



B2. Disembark 
Disengage machine. 

Pick up stop number. 

If there are passengers on this bus to 
alight at this stop, 


remove first passenger, 

calculate alighting time, 

adjust passenger lists and commit the 
bus to unload. 


Figure 62 


B3, Printing 

Print the values of M 9 .. . M 21 . 

Cl. Load Passengers 

Find an available bus, with empty seats, 
which is at a bus stop with a non-empty 
queue. 


Sample a journey distance (number of 
stops travelled) and determine the des¬ 
tination of the passenger. 

Adjust the count of passengers on the bus 
alighting at that stop,* the number of 
passengers on the bus and the number 
embarking at the stop. 

Repeat until the bus is either full or no 
queue is left. 

Remove first passenger. 

Allow time S to load passengers. 

Adjust S for ‘late-comers’. 

Calculate rest of loading time. 

Reset count of passengers embarking. 
Commit bus. 

Return for other buses, if any. 




- a - t 


Figure 63 
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C2. Starting 

Find a bus which is available 
(but with no passengers to load). 

Calculate the next stop and 
commit bus for journey. 

Engage to B2. 

Figure 64 



In this example, the ordering to the C-activities is tapojajrt since 
buses which are available after Cl must be ready to depart. If the order 
were reversed, extra tests would be needed to establish that it was ready 

t0 ^"entities called machines need not be conventional machines. A 
powerful variant of the technique is available if a known group ofitems 

»»- - * * t of r““o E „fo”” ir: 

tSr/for P th C e7th process ’wiiich iakes a time given by a distribu- 

The J m items to be processed are called machines 1> •••>/>■• ■ m 
each has a state associated with it—the stage of processing reached. 
Suppose we require the intervals between items leaving the system 
assuming that items enter the system in order and are dealt with at each 
process in order. 


Machines 


S j: during /th process or waiting for (;+l)th 

process . .. . . 

Q. number of yth process machines available 

W last time item left system 
M count of items 


All machines are engaged to B 1. 


G E N E R A L SIM U L A TIO N P R O B L E M S 


165 


B\ 


Set / to state of returning 
item. 

Find an item which is 
available in state /—1, if 
any. 

Commit it to process /. 

If no such item, increase 
count of available 
machine of type /. 


If last process, print in¬ 
terval and test for last 
item. 


Otherwise, test if any 
machines available for 
next process. If so, re¬ 
duce count of such 
machines and commit 
item to the process. 



Figure 65 

No C-activities are required since the B-activity looks back to see if 
the present process has a queue (formed by earlier processes being 
completed when no machines were available) as well as forward to see 
if the next process can start. 
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By keeping further counts Cj, say, of the queue sizes, the unsuccessful 
searches can be avoided. 



If some of the processes involve, in addition to the basic processing 
machine, an ancillary machine, such as a crane, and a group of these 
are available for common service, then this simple model breaks down. 
However, it is still possible to retain the same basic structure. 

Associate with each process a constant k t taking value one if a crane 
is needed, zero otherwise. It is also necessary to specify in the event of 
competition which process shall have the use of a crane when it becomes 
available. This is achieved by associating with each process u a priority 
number p u and choose that process from those with non-empty queues 
and the necessary machines available which has minimum priority 
number. 

A C-activity is now necessary. The 5-activity becomes, using L for 
a count of cranes: 



B 1 


Set name of state and 
adjust count of cranes. 

If a crane was used or no 
queue for the process add 
to count of process machines. 


Otherwise find an item 
available in state j~ 1, 
adjust queue count and 
commit item to state / 

If last state dealt with, print. 


Consider next process. If a crane 
is wanted and none is available, 
adjust queue count. 

Otherwise if no crane required, or 
if one already idle and a process 
machine is available, adjust crane 
count (if required) and machine 
count and commit to next process. 



£j>°H 




•SAM R 


'7=kW-> P r 'nt and 
count O-M 


L' = L-kj 
Qj^-I 
Tj>T 0 +SAM Rj 

s/~-j 

Figure 67 
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Cl 

If a crane is available, find 
the process with minimum 
priority number for which a 
crane is needed and at least 
one machine and one item 
are available. 


Find such an item. 



A 


Reduce count of cranes, items 

[_' =L-1 

and machines and commit item to 

Cj -Cj-1 

selected process. 

Qj -Qj" 1 

T' =T 0 +SAM Rj 

% = J 


Figure 68 


The mechanism for stopping the simulation is now on the count M 
and, if the duration mechanism is built into some standard structure, 
the duration must be set very large to prevent it stopping the simulation 
prematurely. A little care is required with the initial conditions to 
start the simulation. 

If the conventional approach is taken a very long list of machines 
is produced (of length ]►>;), and as each machine finishes processing 
an item the long list of items must also be scanned to find a suitable 
one to process next. If one is not available, a second activity is required 
to search for a suitable machine when such an item is released by an 
earlier machine. 

The technique used throughout the examples of this and the preceding 
chapters has been one in which the simulated time is advanced by 
irregular steps to the next significant event and these steps are found 
by scanning a relevant set of times. An alternative procedure is to 
advance time by a regular amount and then determine if any event 
has occurred in the interval. 



GENERAL SIMULATION PROBLEMS , 169 

This method has the disadvantage that for an equivalent accuracy 
the number of time-advancing steps is greatly increased without reduc¬ 
ing the number of time comparisons made at each advance. Also the 
possibility of every activity is questioned at each time advance. 

The constant time step technique has now been discarded by most 
workers for the event tracing type of simulation. 

It is possible, however, to describe some processes at least approxi¬ 
mately by counts of the numbers of events in a fixed time period, and 
then constant time advance becomes desirable. 

A simple queue simulation can be constructed in this manner. 
Consider a long queue in front of a machine which can deal with X 
items per hour. The quantity X is considered a random variable, which 
can be directly observed from the plant. Similarly, the input to the 
queue can be described by the number of items Y arriving in each hour. 
Then the throughput in the rth hour is given by the recursion 

T r = min ( X n Z r + Y r ) 

where Z r is the stock at the beginning of the hour. This simply states 
that the throughput is the potential throughput or the total available, 
whichever is the lesser. 

The stock at the end of the hour is 

Z r+1 =Z r +Y r -T r = max (0,Z r +Y r -X r ) 

In these equations X n Y n are samples from two distributions. 

These formulae are only approximate as it is assumed that all the 
items to arrive will be available when the service needs them. In fact, 
part of the variability of the number processed arises because delays may 
be caused to the server by the lack of items to serve. Some allowance is 
made for this in the distribution of X, but without discrimination. Large 
values of X should not appear when T+Z is small, since this is when 
delays are likely to reduce the throughput. 

There are systems where the scheme is quite accurate. Consider a 
large stock which will never run out, replenished by arrivals and used 
by two processes in tandem. Assume from the nature of the product 
that no stock may accumulate between the processes. 

In the case when the natural throughput of each process is given by 
a distribution, then the throughput of the tandem process is 

min (X, Y) 




170 


THE ART OF SIMULATION 

If the stock is Z and the input is U per hour and the output is T , we 
have the recursion 

T r = min (X„ Y r ) 

Z, + i = Z r +U r —T r 

The difficulty, in this case, is to determine the free distribution of 
throughput for each process. It requires records showing periods when 
process 2 was waiting for items (to estimate for process 1) and periods 
when process 1 was held up continuously by process 2 (to estimate for 
process 2). 

Even in this case, there are end effects between successive hours 
which are not quite accurately dealt with. An exact number of pieces 
are rarely dealt with in each interval and the partial process times must 
be rounded up or down. A little care is needed to prevent bias. 

The technique is best used for quick surveys of a situation where the 
number of items involved in each interval is quite large (say, 20 to 50 
items). 



CHAPTER 13 


DESIGN OF SIMULATION EXPERIMENTS 1 

The preceding chapters have described how to produce efficient 
computing procedures for simulating a complex industrial planfEach 
un o such a procedure has produced a single vector of answers These 
can be thought of as a single statistic produced by a complex p ocedu e 
from the whole set of random variables which represented variouTtime 
P ocesses involved. Thus the value so produced Is liable to fluctuaSon 
and repeated runs are necessary to establish the magnitude of this 

Quoted y t0 redUCe the Sampling err ° r ° n the overa11 mean 

The simple procedure of repeating runs, treating these as independent 
sample values until variability has been reduced to the desired !evel is 
not usually feasible as the computing labour to obtain a single samnle 
value is quite high and the quantity under examination is often highly 
variable (in fact if it were not so, some estimate of its mean value could 
° Un . u a deterministlc approximation in most cases). 

tW h h S U b ® C ° meS desirable ^ utilise the variance reducing techniques 
that have been considered for Estimation in Chapter 9. The three 
techniques described there can be listed as the use of: 

(i) stratification; 

(ii) control variates; 

(Hi) antithetic variables. 

For successful stratification, it is necessary to devise strata for which 

strata Ire f° babllltles of a sam P Ie of input variables falling into the 
strata are known, or given that the input variables have so fallen the 

cTiT “ pec,5d va, " e t 

coI?v P ° SSibiIit u ° f determining a Partition for which either of these 
conditions can be met is remote in industrial simulation and will not 
be further constdered. There is, however, one similar device t l can 
occasionally be used. Suppose the simulation involves the use of a se" 
ofk similar machines and the rules for using them are symmetrical i e 

for a^oTa mn,r 0lVe ?’ T* ^ & Ch ° iCe ° f machines is availabI e 
job a random selection is made. Suppose moreover that the 
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utilisation of the machines is low, so that over the duration studied an 
appreciable proportion of the runs made (from randomised starting 

conditions) do not use all the machines. ... 

Now suppose that the number of runs involving machines i u i 2 , • • *> h 

is •>*'»•) and the avera § e res P onse from these runs . ls 

Z(i i . . ., Response is used here as a neutral term for the quantity 

being estimated. It may, for example, be a total delay time on a machine, 
an output, the maximum length of a queue, the number of delays or the 

mean delay on a machine. 

Now the usual estimate can be written as 


i,)Z(t i, i 


h)/I n{l 


. . •, i r ) 


where p is the set of all possible sub-sets of k objects. 

However, from the symmetry of the stratification, the probabi lty 
of falling in the class (i„ i 2 , ■■■, Q can be written as p r depending only 
on r, the number of machines, and not on the individual names of the 
machines. This fact can be utilised in the estimation by using the 
estimate: 

z = { f y hOl ...,i r )E Z(i 1 ,i 2 ,...,ir)l/I(«('i. > 2 . 0) 

l,=Op r r- }lp 

No satisfactory analysis of the reduction of the variance using this 
estimate seems available and great caution should be used to ensure 
that the symmetry conditions apply, otherwise bias can be introduced 
The application of the control variate is quite straightforward. Select 
any time distribution involved in the simulation which clearly affects 
the response. For example, if the response is the time taken by a simple 
queue system to deal with a fixed number of items, then an important 
distribution is the process time distribution. We would expect the 
average of the process times actually experienced m any run to be 
positively correlated with the response from that run. As another 
example, the average cycle time of a furnace will be positively correlated 
in a simulation run with the time to achieve a given output. 

Since the distribution from which these sample variables are drawn is 
known, the true average value is known and thus the conditions for the 

control variate techniques are all satisfied. . 

Moreover, the extra work required to calculate the control variate is 
quite small and in most cases can be neglected. In this case, the tech- 
nique always shows an economy. 
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In many cases, of which the two examples are typical, the natural 
response is negatively correlated with the control ...variate,The natural 
responses in the two examples are output in a fixed time (i.e. propor¬ 
tional to the reciprocal of the response suggested). This can be dealt 
with by noting that if y and z are negatively correlated —y and z are 
positively correlated. 

Thus the variable estimated is z + oty and a is chosen by the method 
of least squares. Formally there is no difference and the sign of the 
estimate of a is an indication of whether y should be regarded as posi¬ 
tively or negatively correlated with z. 

In experiments of this kind there are often several variates which 
can be used for control. The control variate technique can be extended. 
Suppose the response is z and the control variates are y u y 2 , . .y m . 
Then the variate estimated is 

m 

z + S “i y t 

i— 1 

and the coefficients « l5 oc 2 > * * •» can be estimated by the method of 
least squares as before. To avoid bias, the partition of the total sample 
into k equal parts can be used to prepare k estimates of oc u oc 2 , • * •> a m 
independent of the sub-sample means z, y u y 2 , . . ., y k . 

The technique is closely akin to that of the analysis of co-variance 
used to utilise the known values of concomitant variables. In this 
analysis, a linear law for E(z) as a function of y u y 2 , ...,y k is usually 
assumed, but the analysis of Chapter 8 shows that this is not strictly 
necessary for the validity of the method. If the linearity does hold the 
partitioning of the sample is unnecessary. However, in most cases this 
linearity is most unlikely, and the extra cost of the full analysis is justi¬ 
fied. 

Turning now to the method of antithetic variates, we note that if an 
effective control variate has been found, a run in which the samples 
from the distribution in question have tended to be greater than the 
true mean (leading to an average greater than the true mean) then the 
response will tend to be greater than its true mean. If a companion run 
can be made in which each sample from the distribution is negatively 
correlated with its mate, then the response from the run will tend to be 
less than its true mean. Similarly, if the samples from the first distribu¬ 
tion tend to be below average. Thus the two responses will be negatively 
correlated. 

It now remains to see how to ensure the negative correlation in the 
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samples from the control variate. If these are obtained by the inverse 
cumulative distribution transformation this is easy. If the uniform ran¬ 
dom variable £ chosen is greater than a half, the resulting variable is 
greater than the median (and hence tends to be greater than the mean). 
On the other hand a uniform variable less than a half leads to a variable 
less than the median. If the two runs are sampled by using and 1 — 
respectively, then the resulting variables are negatively correlated, 
f and 1 — £ are negatively correlated with coefficient — 1 and the non¬ 
linear transformation merely reduces the magnitude of the correlation 
but it will always remain negative. 

The advantage of using pseudo-random numbers for uniform 
random variables now becomes clear. This not only allows a run to be 
reproduced by starting each generation at the same initial value but also 
enables the negative correlation required to be obtained in two ‘anti¬ 
thetic’ runs. Each individual process is provided with its own generator 
giving uniform random variables u l9 u 2s . ..,w m for the m samples 
used in the run. 

In the first run the transformation is effected using u l9 u 2 , ..., u m 
as the independent variables. In the second run the variables used are 
1 — u l9 1 —u 2i 1 — w 3 , ..., 1 -w m . All other generations from other 
distributions are started at the same initial value in the two runs. 

In practice, the different times generated may involve a different 
number of samples being taken on the two runs, but this has no serious 
effect although it does tend to weaken the correlation effect. 

By symmetry we can argue that the variance of the response on 
these two runs will be equal and use their average as an estimate of the 
mean response. Even if in some obscure way the assumption of negative 
correlation is not justified the estimate is still valid. 

If a total of 2 n runs are to be made, suppose n pairs of runs are made 
yielding estimates z i9 z[. The starting conditions for each of a pair are 
identical, but the different pairs are started with arbitrary conditions 
including random choice of initial values for each pseudo-random 
number generator. 

The estimate of the response is merely 


I (z,+*D 

2n i = i 


An estimate of variability is required and the best is obtained from 
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SSomwnSS'SeSS ^ ^ reSP ° nSe Starting from 


s' 2 = 


-2{(z i +z ! 0-(z + z')} : 


4 (n~\) 

IS not valid for this purpose although it is the correct estimate to me 
when estimating the sampling error in 2. 

An extension to utilise several control variates is possible it L u * 

distributions A’l R2 m- m m j 1 If there are k 

ii ’ KI ’ •••>■«« to be used as sources of antithetic vnri 

ables we may proceed thus. Choose a sample size » 2* Tn r 

PT ^ZS each * ^ Ct ° ria ^ 
For k = 3, the following table illustrates the arrangement 


SAMPLE 

1 

2 

3 

4 

*1 Wl 

0 

1 

0 

1 

R2 u 2 

0 

0 

1 

1 

A3 «, 

0 

0 

0 

0 


6 7 

1 0 

0 1 

1 1 


square deviations of the sets of* commonly treated runs 

imprecise t? f 7* fairly high 80 that this latter estimate is not too 
imprecise, the size of k is restricted to 2 or 3 in practice. 

ine control variate technique may be applied to the a 

>bl. , «i„ E d,h„ the dJ, ib „S o, p'.ftb" "fo Z 

Even if m any particular case, these techniques do not cause any 
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drastic reduction in variation, they cost so little to implement that they 
are always to be recommended. 

The methods of generating starting conditions merit some comment. 

In a complex simulation it is very difficult to concoct sensible starting 
conditions if these are intended to be typical of the state of the plant 
when inspected at random. If the plant closes down at frequent intervals 
(each day or each weekend) then the real starting conditions are easy 
to determine and each run can be started from there. But if the plant 
runs are very long a start of the simulation from ‘typical’ conditions is 

required. t ^ , 

There can be no general theory of how to do this but the technique 

most favoured is to invent starting conditions and allow the simulation 
to proceed for some time and take the final conditions as the initial 
conditions of the genuine run. This raises the question of how long to 
make the preliminary run and again no definite answer is possible. A 
general requirement is that the longest cycle in the plant should have 
been executed at least 3 or 4 times before transient abnormal behaviour 
induced by non-sensible starting conditions can be expected to have 

died away. ..run 

It also needs a decision whether successive runs shall consist ot wholly 

independent runs (started as described above) or shall be obtained by 
using the final calculation of one run as the start of the next. This 
should certainly not be done if real plant runs are only short and the 
continued running of the plant may reveal instabilities that are never 
given time to develop in the real plant. For example, a queue may operate 
with input faster than average output but this will not matter if the 
remaining items at plant close down are dealt with differently during 

the shut-down. , ,, .u 

The theoretical solution to this problem depends on whether the 
response in one period is positively or negatively correlated with that 
in the next period. In the former case, separate starts should be used in 
the interest of variance reduction. In the latter case, continued runs are 
better. In practice, the labour of making valid fresh starts weighs heavily 
in favour of continued runs. 

An allied question is the length of time that should constitute a run. 
This should be related to the purpose of the simulation and is rarely 
settled by statistical considerations. A plant with short real runs should 
be simulated for complete runs so that end effects are consistently 
dealt with. Plant with long runs can be dealt with by shifts, days or 
weeks. In case of doubt as to which is an appropriate duration, the 
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shorter should always be chosen as results can always be amalgamated 
but not broken down. The extra printing involved should not be an 
intolerable burden. If it appears so, the responses should be stored in 
the computer and re-processed to give a small amount of output. 

So far it is assumed that the purpose of the simulation is to estimate 
some quantity as accurately as possible. In the first stages of an in¬ 
vestigation this is necessary to test a simulation of an actual plant as a 
means of verification of the validity of the model. However, a model 
that exactly reproduces an existing plant is of little value (except 
possibly for educational and training purposes), since the plant itself 
can give all the information that the model can. 

The second stage of the investigation is to change the model to 
represent a proposed change of the real plant. The purpose of the 
simulation runs then is to measure the effect of this on the response 
The interest now shifts from absolute values to comparisons and we 
accordingly turn to the problem of designing the runs to enhance the 
accuracy of such comparisons. 

Consider the simple situation in which two models are to be com¬ 
pared. It is assumed that the models differ only in some smaU part 
(even though we hope it to be an important difference!) and that most 
of the input distributions are the same. The usual differences are either 
m the number of ancillary machines offering service or in the rules of 
behaviour of the machines under different circumstances. 

Clearly two runs, one on each model, are likely to give a better esti¬ 
mate of the difference between the models if initial conditions and 
process times are the same than if not. 

This pair of runs is made from the same initial conditions and using 
the same initial pseudo-random numbers for the distributions (if one 
model contains extra or different distributions a random choice is made 
for these). The difference of the responses is taken and this is treated as 
a response and the pair of runs treated as a single run. Both the control 
variate technique and the antithetic treatment can be used and se¬ 
quential estimation used, if desired. 

As before, if variance reducing techniques are used, careful distinction 
must be made between the sampling error of the estimate and the 
variance of the statistic in the population. 

The same process can be applied if three or more different models 
are to be tested. 

If there are t models then n sets of / runs are made, each member of 
each set starting from common initial conditions and using a different 
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one of the k models. Differences between averages for the n runs on 
each model give estimates of the model differences in which a large 
amount of the variability has been removed. 

The number of sets, n, should be kept as large as possible in order 
that the differences are measured over as widely a differing set of condi¬ 
tions as possible and to enable the population variability of the re¬ 
sponses to be measured. If t becomes large, this implies a very large 
number, nt> runs and this may be prohibitive of labour. 

The solution then lies in using n incomplete sets of runs—incomplete 
in the sense that not all k models are represented in each set. In fact, 
we adapt the theory of block experiments to the situation. The blocks 
consist of the sets of k runs with common initial conditions, the 
treatments are the t different models and the plots are the individual 
runs. 

Our treatment of the case / = k = 2 is a paired comparison experi¬ 
ment. The case of small k = t is dealt with by a complete randomised 
block experiment. For large t we must use incomplete block experi¬ 
ments. 

Randomisation would play no useful role in the experiments as so 
far described since each run of a set will produce an identical response 
if ‘treated’ with the same model. The purpose of randomisation is to 
make that true of the expected response. There may be some merit in 
not making the initial conditions of the runs in a set identically equal 
but allowing some minor variations so that the assumptions of the 
analysis technique used for the experiments are approximately valid. 

The whole repertoire of designs available for real experiments is now 
open to us in our simulation experiments. The whole subject is too 
large to be dealt with here and the reader must look elsewhere for more 

detailed information. . 

However, one design is so useful that a brief description is desirable. 
This is a special case of a balanced incomplete block design. In general 
this is a design arranged so that every model is run equally often and 
every pair of models are run together in the same number of blocks. 
The simplest case of this is when k — 2 and each block is just one pair. 
Since there are k C 2 pairs the number of blocks is a multiple of this. 
For example with 5 models (labelled 1, 2, 3, 4, 5) the 10 blocks of a 
single replicate consist of the pairs (1,2), (1, 3), (1,4), (1, 5), (2, 3), 
(2, 4), (2, 5), (3, 4), (3, 5), (4, 5). At least two replicates are desirable 
if the population variability of any difference is to be estimated. The 
analysis will ordinarily assume equal variability of response in all 
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models and hence in all differences. To test this would require many 
replicates. 

This design and the extension of it, known as partially balanced 
designs, assume no structure to the set of models under test. However, 
often this set has been generated in some systematic way. For example, 
the models may be developed from the real plant by adding a storage 
tank of capacities 10,15, 20, 25 units, by adding 1, 2, 3 extra cranes in a 
machine shop, or by trying two variants of a queue discipline on each 
of three queues (8 models in all). 

In the first examples, the ultimate purpose will be to draw a graph 
of response against an independent variable. In the last example, we 
have a factorial arrangement of 3 factors (queue) at 2 levels. This is a 
very common structure to experiments and the theory of the design 
of experiments has concentrated a good deal of attention on the prob¬ 
lem of designing new experiments. 

Once again the reader must look elsewhere for details. 

In any experiment with t models, although there are differ¬ 

ences that can be studied, only t—1 of these are independent and there 
is some skill in choosing the appropriate combinations. For example, 
if z lf z 2 , z 3 are the three responses with 1, 2, 3 extra cranes, the com¬ 
ponent i( z 3 ~ z i) measures the average increase in response per extra 
crane while z 3 — 2z 2 + z 1 — (z 3 — z 2 ) — (z 2 —z l ) measures the change in 
increase of response as the nilmber of cranes increase. 

The independence of the t—1 differences is in an algebraic sense. This 
is quite distinct from the statistical dependence or correlation between 
estimates of various differences. It is the distinguishing feature of com¬ 
plete block designs that algebraic independence implies statistical 
independence (at least under some mild assumptions). All incomplete 
block designs give correlated estimates of responses. Of course, it is 
possible to determine special comparisons (of differences) that are 
independent, but unless the experiment has been specially designed to 
achieve this, they are unlikely to be comparisons that are of interest. 

The whole field of experimental design for simulation experiments is 
in its infancy and offers a fertile field for further research. 
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